A Variable Precision Attribute Reduction Approach in Multilabel Decision Tables
Owing to the high dimensionality of multilabel data, feature selection in multilabel learning will be necessary in order to reduce the redundant features and improve the performance of multilabel classification. Rough set theory, as a valid mathematical tool for data analysis, has been widely applied to feature selection (also called attribute reduction). In this study, we propose a variable precision attribute reduct for multilabel data based on rough set theory, called -confidence reduct, which can correctly capture the uncertainty implied among labels. Furthermore, judgement theory and discernibility matrix associated with -confidence reduct are also introduced, from which we can obtain the approach to knowledge reduction in multilabel decision tables.
Conventional supervised learning deals with the single-label data, where each instance is associated with a single class label. However, in many real-world tasks, one instance may simultaneously belong to multiple class lultilabel decision tablabels, for example, in text categorization problems, where every document may be labeled as several predefined topics, such as religion and political topics ; in image annotation problems, a photograph may be associated with more than one tag, such as elephant, jungle, and Africa ; in functional genomics, each gene may be related to a set of functional classes, such as metabolism, transcription, and protein synthesis . Such data are called multilabel data.
Owing to the high dimensionality of multilabel data, feature selection in multilabel learning will be necessary in order to reduce the redundant features and improve the performance of multilabel classification. Among various feature selection approaches, rough set theory, proposed by Pawlak , has attracted much attention due to its special advantage, that is, the capability of studying imprecise, incomplete, or vague information without requiring prior information.
Feature selection in rough set theory is also called attribute reduction. Generally speaking, attribute reduction can be interpreted as a process of finding the minimal set of attributes that can preserve or improve one or several criteria. The minimal set of attributes is called an attribute reduct. In past few years, many researchers have done much work on attribute reduction and the summarization of important results has been done in [5, 6]. The idea of attribute reduction using positive region was first originated in [7, 8], aiming to remove redundant attributes as much as possible while retaining the so-called positive regions. Afterwards, Ziarko introduced the variable precision rough set model and -reduct to improve the ability of modeling uncertain information . Furthermore, Kryszkiewicz proposed five kinds of attribute reducts for inconsistent information systems  and the relationships in these five reducts and some related results are reconsidered and rectified in . Applying discernibility matrix, Skowron and Rauszer  proposed an attribute reduction algorithm by computing disjunctive normal form, which is able to obtain all attribute reducts of a given information system. On the other hand, for obtaining a single reduct from a given information system in a relatively short time, many heuristic attribute reduction algorithms have been developed. In order to reduce computational time, Xu et al.  proposed a quick attribute reduction algorithm with complexity of max. Further, Qian et al.  developed a common accelerator based on four kinds of heuristic reduction algorithms to improve the time efficiency of a heuristic search process.
As far as we know, however, little work has been done on applying rough set theory to feature selection in multilabel learning. Although directly applying the existing attribute reduction methods to multilabel data is possible, it does not take into account the uncertainty conveyed by labels and thus can be enhanced further. In this paper, we propose a new attribute reduct for multilabel data, namely, -confidence reduct, which overcomes the limitations of existing attribute reduction methods to multilabel data. Furthermore, judgement theory and discernibility matrix associated with -confidence reduct are also established. These results provide approaches to knowledge reduction for multilabel data, which are significant in both the theoretic and applied perspectives.
The rest of this paper is organized as follows. Some basic notions in rough set theory are briefly reviewed in Section 2. Section 3 is devoted to introducing multilabel decision table and analyzing the limitations of the existing attribute reduction methods to multilabel data. In Section 4, the new attribute reduct, -confidence reduct, is proposed and the corresponding judgement theorem and discernibility matrix are also introduced. A computative example is also given to illustrate our approaches. Finally, in Section 5, we conclude the paper with a summary and outlook for further research.
In this section, we will review several basic concepts in rough set theory.
A decision table is an information system with , where is a nonempty, finite set of objects called universe; is a nonempty, finite set of condition attributes; is a nonempty, finite set of decision attributes. Each nonempty subset determines an indiscernibility relation in the following way: The indiscernibility relation partitions into some equivalence classes given by , where denotes the equivalence class determined by with respect to ; that is,
Let and . One can define a lower approximation of and an upper approximation of by respectively. The lower approximation is called the positive region of and denoted alternatively as . If , then is called a rough set.
Attribute reduct is one of the most important topics in rough set theory, which aims to delete the irrelevant or redundant attributes while retaining the discernible ability of original attributes. Among many attribute reduction methods, the positive region reduct [7, 8] is a representative method.
Definition 1. Let be a decision table and . is a positive region reduct of if and only if satisfies the following conditions:(1),
(2) for any ,
where and are the equivalence classes, called decision classes generated by the indiscernibility relation .
3. The Multilabel Data
In this section, we first introduce the multilabel decision table and then analyze the limitations of existing attribute reduction approaches to multilabel data.
3.1. The Multilabel Data
Multilabel data can be characterized by a multilabel decision table with , where is a nonempty finite set of objects, called universe; is a nonempty finite set of condition attributes, called condition attribute set; is a set of information functions with respect to condition attributes and is the domain of ; is a nonempty finite set of possible labels called label set; is a set of information functions with respect to labels and is the domain of the label . If the object is associated with label , then ; otherwise . The 5-tuple can be expressed more simply as if and are understood.
Some conventions in multilabel learning are shown as follows.(1)The object having no labels is irrelevant to multilabel learning and thus is not taken into account in the setting [15, 16]. Note that this convention is a prerequisite for the proposed approach, as discussed in Section 4.(2)Each label from associates with at least one object in .
The following example depicts a multilabel decision table in more detail.
Example 2. A multilabel decision table is presented in Table 1, which is a part of document topic classification problem. It consists of nine documents that belong to one or more of three labels: religion, science, and politics. It can be seen that , , and . Note that each object in is associated with at least one label from and each label from is associated with at least one object in .
3.2. The Limitations of Existing Attribute Reduction Approaches to Multilabel Data
In this section, we mainly analyze the limitations of existing attribute reduction approaches to multilabel data.
For a multilabel decision table , each label attribute can be viewed as a binary decision attribute and then form an indiscernibility relation as follows: partitions into a family of equivalence classes given by . In this case, most existing attribute reduction approaches can be directly applied to multilabel data. Here we consider, for instance, positive region reduct, to delete redundant condition attributes in multilabel decision tables. The following example illustrates this process.
Example 3. For the multilabel decision table given by Table 1, we can conclude that
Thus, we have . It means that the other equivalence classes , , and in are all uncertain with respect to the label set . For instance, consider the equivalence class . Notice that and are indiscernible with respect to while their respect label sets and are discernible with respect to . This means is uncertain with respect to the label set . Furthermore, we can calculate that
Since , , and are all uncertain with respect to , they can be safely merged without any information loss. In other words, removing the attribute or is valid from the perspective of rough sets. Moreover, one can check that no more attributes can be removed from either or ; so and are both positive region reducts.
However, notice that all objects in must be associated with label and may be associated with label in the probability of and must not be associated with label , whereas all objects in must not be associated with label and must be associated with label and may be associated with label in the probability of . Thus, the uncertainty of and is different, and the equivalence class , the union of and , cannot preserve the uncertainty conveyed by labels. This implies that is not an appropriate attribute reduct. Similarly, is also not an appropriate attribute reduct for multilabel data.
Through the above analysis, we know that some positive region reducts cannot preserve uncertainty implied among labels for multilabel data. In fact, since the computation of positive region reduct has to refer to the indiscernibility relation , the uncertainty conveyed by labels may be not analyzed thoroughly. Furthermore, note that the uncertainty characterized by is also considered by the other existing attribute reduction methods; so they have the same limitations for multilabel data like positive region reduct. Thus it is necessary to reconsider attribute reduction method for multilabel data.
4. The New Attribute Reduction Approach in Multilabel Data Decision Tables
In this section, we will introduce a new attribute reduct referred to as -confidence reduct and show some advantages of -confidence reduct in unraveling the uncertainty of multilabel data. Moreover, judgement theory and discernibility matrix associated with -confidence reduct are also established.
4.1. -Confidence Reduct in Multilabel Decision Tables
First, we present the following definition.
Definition 4. Let be a multilabel decision table, where and . For each label , one defines the label decision set as the collection of all possible objects having the label:
Considering Convention 1 of multilabel learning, one has ; that is, form a cover of .
In the following, we present a particular function to characterize the uncertainty implied among labels.
Definition 5. Let be a multilabel decision table, let be the power set of label set , and let be label decision sets. Given a subset and , one defines a -confidence label function , as follows:
The -confidence label function is the collection of the labels that associate with at least objects in . In other words, is the collection of the labels which associate with each object in by at least confidence level.
Example 6. Consider the multilabel decision table given by Table 1. If , then the -confidence label function with respect to attribute set A can be calculated that
We have the following property.
Theorem 7. Let be a multilabel decision table, . Then(1)if , then ;(2)if , then ;(3)for any , ;(4)if , then .
Proof. (1) Let . Then we have . Note that ; thus . It means that . Therefore .
(2) Since , we have .
If , then . Since , we have . Thus . Therefore .
According to Theorem 7 (1), we have .
If , then . Since , we have . That means . Therefore .
(3) If there exists such that , then , . Thus . Since , we have . It is a contradiction.
(4) It is straightforward by the definition of and .
Now we define the consistent -confidence set using -confidence label function. Furthermore, we present the definition of new attribute reduct.
Definition 8. Let be a multilabel decision table and . If , for all , one says that is a consistent -confidence set of . If is a consistent -confidence set and no proper subset of is a consistent -confidence set, then is called a -confidence reduct of .
A -confidence reduct is the minimal set of condition attributes that preserves the invariances of the -confidence label function of all objects in .
Example 9 (continued from Example 6.). For the multilabel decision table given by Table 1, we have
Therefore, we obtain the unique -confidence reduct: .
Considering Example 3, however, we know that and are two positive region reducts for the same multilabel decision table. We think -confidence reduct is more appropriate for multilabel data than positive region reduct. This is because -confidence label function can more reasonably characterize the uncertainty implied among labels than the indiscernibility relation .
Note that the uncertainty characterized by is also considered by the other existing attribute reduction methods. Therefore, for multilabel data, -confidence reduct has significant advantages when compared with existing attribute reduction methods.
4.2. Discernibility Matric of -Confidence Reduct
This section provides a discernibility matrice approach  to obtain all -confidence reducts. Firstly, we present the judgement theorem of consistent -confidence set.
Theorem 10 (judgement theorem of consistent -confidence set). Let be a multilabel decision table, and . Then the following conditions are equivalent:(1) is a consistent -confidence set;(2)for any , if , then .
Proof. . If there exist such that , then . By Theorem 7(4), we have . Note that is a consistent -confidence set; we have and . Therefore .
. Since , it is easy to verify that forms a partition of .
For any , if , then . By the assumption we obtain .
Let . Then for all , we have ; that is to say, .
Therefore we have that As a result, .
On the other hand, we assume that ; however, . For any , we have ; hence ; that is to say, . Since , we have Therefore , which is a contradiction.
Thus we conclude that for any . According to Definition 8, we have that is a consistent -confidence set.
Theorem 10 provides an approach to judge whether a subset of attributes is a consistent -confidence set in multilabel decision tables. Now we present a method for computing all -confidence reducts. First, we give the following notion.
Definition 11. Let be a multilabel decision table and . One denotes
By the value of with respect to the objects in . Define
Then is called -confidence discernibility attribute sets. And is called the -confidence discernibility matrix.
For the -confidence discernibility matrix, we have the following property.
Theorem 12. The discernibility matrix satisfies the following properties:(1) is a symmetric matrix; that is, for any , ;(2)elements in the main diagonals are all ; that is, for any , ;(3)for any , .
Proof. The proofs of (1) and (2) are straightforward. We only need to prove (3). If there exists such that but , that is, and , then according to Definition 11, we have and . Thus ; that is, , a contradiction.
In the following, we establish some connections between consistent -confidence set and discernibility matrix.
Theorem 13. Let be a multilabel decision table, and . Then, is a consistent -confidence set if and only if for all .
Proof. “” For any , there exist , such that and . From the definition of , we have . Since is a consistent -confidence set, we have from Theorem 10. Therefore there exists such that ; that is, . Hence ; that is, .
“” Let . Since , for all , there exists such that . Then we have ; that is, for and . It means . We then conclude that if , that is, , then . It follows from Theorem 10 that is a consistent -confidence set.
Next we introduce the concept of discernibility function which helps us to compute -confidence reduct.
Definition 14. Let be a multilabel decision table, let , and let be the -confidence discernibility matrix, where . A -confidence discernibility function for a multilabel decision table is a boolean function of boolean variables corresponding to the attributes , respectively, and is defined as follows: where is the disjunction of all variables such that .
In the sequel we will write instead of when no confusion arises. Furthermore, according to related logical knowledge, we have the following theorem.
Theorem 15. Let be a multilabel decision table. Then an attribute subset of is a -confidence reduct of if and only if is a prime implicant of .
Theorem 15 provides a discernibility matrix based method to compute all -confidence reducts. The following example illustrates the validity of the approach.
Example 16. Consider the multilabel decision table given by Table 1. We have , where
According to the calculation results of in Example 6, we have
Note that . Therefore .
We can calculate the -confidence discernibility matrix shown in Table 2.
Consequently, we have
By Theorem 15 we derive that is the unique -confidence reduct which accords with the results in Example 9.
The -confidence reduct presented in this paper is an attribute reduction method designed for multilabel decision tables. Compared with the existing attribute reduction methods, the -confidence reduct accurately characterizes uncertainty implied among labels; thus it is more appropriate for multilabel data. Moreover we proposed the corresponding discernibility matrix based method to compute -confidence reduct, which is significant in both the theoretic and applied perspectives. In further research, the property of -confidence reduct and corresponding heuristic algorithm will be considered.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors thank the anonymous reviewers for their valuable comments and suggestions to improve the paper. The work was supported by the National Natural Science Foundation (nos. 61272095, 61175067, 71031006, 61303107, and 71201111), Shanxi Scholarship Council of China (no. 2013-014), Project supported by National Science and Technology (no. 2012BAH33B01), Shanxi Foundation of Tackling Key Problem in Science and Technology (no. 20110321027-02), and the Natural Science Foundation of Hebei Education Department (no. Z2014106).
R. E. Schapire and Y. Singer, “BoosTexter: a boosting-based system for text categorization,” Machine Learning, vol. 39, no. 2-3, pp. 135–168, 2000.View at: Publisher Site | Google Scholar
M. R. Boutell, J. Luo, X. Shen, and C. M. Brown, “Learning multi-label scene classification,” Pattern Recognition, vol. 37, no. 9, pp. 1757–1771, 2004.View at: Publisher Site | Google Scholar
A. Elisseeff and J. Weston, “A kernel method for multi-labelled classification,” in Advances in Neural Information Processing Systems 14, 2002.View at: Google Scholar
Z. Pawlak, “Rough sets,” International Journal of Computer and Information Sciences, vol. 11, no. 5, pp. 341–356, 1982.View at: Publisher Site | Google Scholar | MathSciNet
K. Thangavel and A. Pethalakshmi, “Dimensionality reduction based on rough set theory: a review,” Applied Soft Computing Journal, vol. 9, no. 1, pp. 1–12, 2009.View at: Publisher Site | Google Scholar
C. Wu, Y. Yue, M. Li, and O. Adjei, “The rough set theory and applications,” Engineering Computations, vol. 21, no. 5, pp. 488–511, 2004.View at: Publisher Site | Google Scholar
J. W. Grzymała-Busse, Managing Uncertainty in Expert Systems, Kluwer Academic Publishers, 1991.
J. W. Grzymała-Busse, “LERS—a system for learning from examples based on rough sets,” in Intelligent Decision Support: Handbook of Applications and Advances of the Rough Set Theory, pp. 3–18, Kluwer Academic Publishers, New York, NY, USA, 1992.View at: Google Scholar
W. Ziarko, “Variable precision rough set model,” Journal of Computer and System Sciences, vol. 46, no. 1, pp. 39–59, 1993.View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
M. Kryszkiewicz, “Comparative study of alternative types of knowledge reduction in inconsistent systems,” International Journal of Intelligent Systems, vol. 16, no. 1, pp. 105–120, 2001.View at: Google Scholar
D. Li, B. Zhang, and Y. Leung, “On knowledge reduction in inconsistent decision information systems,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 12, no. 5, pp. 651–672, 2004.View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
A. Skowron and C. Rauszer, “The discernibility matrices and functions in information systems,” in Intelligent Decision Support—Handbook of Applications and Advances of the Rough Sets Theory, pp. 331–362, Kluwer Academic Publishers, 1992.View at: Google Scholar
Z. Xu, Z. Liu, B. Yang, and W. Song, “A quick attribute reduction algorithm with complexity of max,” Chinese Journal of Computers, vol. 29, no. 3, pp. 391–398, 2006.View at: Google Scholar
Y. Qian, J. Liang, W. Pedrycz, and C. Dang, “Positive approximation: an accelerator for attribute reduction in rough set theory,” Artificial Intelligence, vol. 174, no. 9-10, pp. 597–618, 2010.View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
N. Ghamrawi and A. McCallum, “Collective multi-label classification,” in Proceedings of the 14th ACM International Conference on Information and Knowledge Management (CIKM '05), pp. 195–200, New York, NY, USA, November 2005.View at: Publisher Site | Google Scholar
S. Kiritchenko, Hierarchical text categorization and its application to bioinformatics [Ph.D. thesis], Queen's University, Kingston, Canada, 2005.
J. Read, Scalable multi-label classification [Ph.D. thesis], University of Waikato, Hamilton, New Zealand, 2010.