Abstract

One important and valuable topic in rough sets is attribute reduction of a decision system. The existing attribute reductions are designed to just keep confidence of every certain rule as they cannot identify key conditional attributes explicitly for special decision rules. In this paper, we develop the concept of -local reduction in order to offer a minimal description for special -possible decision rules. The approach of discernibility matrix is employed to investigate the structure of a -local reduction and compute all -local reductions. An example of medical diagnosis is employed to illustrate our idea of the -local reduction. Finally, numerical experiments are performed to show that our method proposed in this paper is feasible and valid.

1. Introduction

The concept of rough sets was originally proposed by Pawlak [1] as a mathematical approach to handle imprecision, vagueness, and uncertainty in data analysis. This theory has been demonstrated to have its usefulness and versatility in successfully solving a variety of problems [1]. The main application of rough set theory is attribute reduction in databases. Given a decision system with conditional and decision attributes, attribute reduction aims to find a subset of the original conditional attributes that contain the same information as the original one. The concept of attribute reduction can be viewed as the strongest and the most important result in rough set theory to distinguish itself from other theories.

Along the line in [1], many research works have been concentrated on computing attribute reduction and developing other types of attribute reduction under the framework of rough sets [123]. For example, Skowron and Rauszer [12] employed the approach of discernibility matrix to set up mathematical foundation for finding reducts. Wang [1517] characterized attribute reduction by information entropy. Possible rules and possible reduct of all decision classes were proposed to deal with inconsistence in an inconsistent decision table [4, 6]. In [5], in order to provide an underlying classification of knowledge reductions, five notions of knowledge reduction (possible reduct, approximation reduct, generalized decision reduct, -decision reduct, and -reduct) were investigated and compared in inconsistent systems. In fact, only two of them, possible reduct preserving upper approximations and -decision reduct preserving membership to all decision classes are essential because others are just equivalent to one of them, respectively. The notion of dynamic reducts was described in [2] as subsets of all reducts derived from both the original decision table and the majority of the randomly chosen decision subtables. In [10], -reduct and -relative reduct were proposed to allow occurrence of additional inconsistency that is controlled by means of the parameter . In [19] notions of the distribution reduct and maximum distribution reduction were proposed, and relationships among the maximum distribution reduct, the distribution reduct, and the possible reduct were discussed. In [3, 22], -reduct was introduced to preserve the sum of objects in -lower approximations of all decision classes based on variable precision rough sets (VPRS). However, Zhou et al. [24] pointed out that the dependency function may not be monotonic when computing -reduct and decision rules derived by the -reduct may be in conflict with those derived from the original system. To overcome this drawback, in [9] -lower and -upper distribution reducts were proposed to preserve -lower approximations and -upper approximations of all decision classes, respectively. It is proved that for some special thresholds, -lower distribution reduct is equivalent to the maximum distribution reduct, whereas -upper distribution reduct is equivalent to the possible reduct.

These attribute reductions share the following two arguments. First they are developed in terms of all decision classes and cannot explicitly identify key conditional attributes for particular decision classes, so these reductions can be viewed as global reductions. However, in many practical problems people always pay more attention to some special decision classes rather than other ones, and condition attributes and decision rules with closed connection to these special decision classes always draw much attention. For example, in decision-making of medical diagnosis, key condition attributes related to the disease always draw much attention than other ones, and it is clearly meaningful to identify such key attributes. Second, as it is well known, certain and possible rules can be extracted from a decision system, and confidence of every certain rule is 1 while confidence of every possible rule is less than 1. But most of the existing attribute reductions only offer minimal conditional attributes to keep confidence of every certain rule invariant, and possible rules with bigger confidence are ignored. However, in most practical problems, possible rules with bigger confidence are always available and applied to decision making, so it is clearly meaningful to identify key conditional attributes for possible rules with bigger confidence.

To improve these two arguments in the meantime, the definition of -local reduction is presented in this paper. First we give the concept of -reduction to keep the confidence of those possible rules, and then we further consider -local reduction to offer a minimal description and extract possible decision rules with bigger confidence for special decision classes. Approach of discernibility matrix is employed to characterize the structure of -local reduction. It is proven that the core of -reduction can be expressed as the union of the cores of -local reductions, and the discernibility matrix of -reduction can be obtained by composing discernibility matrices of -local reductions. An example of medical diagnosis is employed to illustrate our idea of -local reduction, and we also perform several experiments to demonstrate the effectiveness of the idea in this paper.

The rest of this paper is structured as follows. In the next section we give some basic notions related to rough sets. In Section 3 we define -local reduction, and approach of discernibility matrix is employed to find -local reduction. In Section 4, we perform numerical experiments to demonstrate that our method proposed in this paper is feasible to process massive data. We then conclude the paper in Section 5.

An information system is a pair , where is a nonempty, finite set called the universe of discourse, and is a nonempty, finite set of attributes. With every subset of attributes , we associate a binary relation , called a -indiscernibility relation, and defined as , then is an equivalence relation and . By , we denote the equivalence class of , including . For , sets and are called -lower and -upper approximations of in and denoted as and , respectively. If , we say is definable, otherwise it is indefinable.

A decision table (DT) (sometimes called a decision system) is an information system , where , . is a set of conditional attributes, while is the decision attribute.

Suppose , then we have . If , then for we can derive decision rule as . This rule can be denoted in terms of sets as and its confidence is computed as . Following we always call the decision rule of which confidence is not less than as -possible decision rule.

3. -Local Reduction of Decision Systems

In this section, we first introduce the definition of -reduction as a global one to consider every -possible decision rule, and we then develop -local reduction as improvement of -reduction to address special -possible decision rules. Approach of discernibility matrix is employed to find -local reduction, and differences between -local reduction and -reduct are explained.

Let be a decision system, , , and , . If for any , , then . Obviously, only includes a single element unless . For two -possible decision rules and , if satisfying , then clearly the rule is also a -possible decision rule, which implies , thus we have the definition of -reduction.

Definition 3.1. is a -reduction of DT if and only if is a minimal set such that for  for all .

Rule can be seemed as the reduced rule of the rule . A -reduction is a set of conditional attributes that keeps confidence of every reduced rule of a -possible decision rule still not less than , since for any object satisfying , , always holds. When , a -reduction is a classical attribute reduction which preserves lower approximations of all decision classes, namely, preserves confidences of all certain rules.

-reduction is defined by considering all -possible decision rules; thus, it can be considered as a global one for the whole system. In many practical problems people always pay more attention to special -possible decision rules related to special decision classes, so we improve the -reduction to the -local reduction as following to capture key condition attributes for special decision rules.

Let , that is, is the family of all definable sets related to . is the set satisfying the following conditions: (1) is a definable set, that is, ; (2) every can derive a decision rule such that . We denote the family of all these satisfying the above two conditions as . Clearly is a -algebra, and every element in is the union of several elements in , .

Definition 3.2. Suppose is a decision system, such that . If , then is -dispensable in for , otherwise is -indispensable in for . The collection of all the -indispensable elements in is called the -local core of for and denoted as . We say that is independent in for if every attribute in is -indispensable in for . A set is called a -local reduction in for if is independent in for and satisfying , that is, is the minimal subset of keeping .

If is a -local reduction for , then for every and , implies , that is, a -local reduction in for keeps confidences of reduced rules of all -possible decision rules determined by elements in not less than . Furthermore, for every we have , and is just a group of decision classes, thus a -local reduction in for aims to select key condition attributes for this group of decision classes rather than for all the decision classes. Let such that , . If is a -local reduction for , then , that is, for , thus for every and is a -reduction. This statement implies a -reduction is a special case of a -local reduction. The -reduction considers all -possible decision rules and decision classes, while -local reduction are developed in terms of special -possible decision rules and decision classes. Specially, for satisfying , if , then a -local reduction for only considers one decision class .

Remark 3.3. In [3, 22] -reduct was developed to keep -dependency function. It seems to have closed connection to -reduction in this paper. However, they are two different concepts. First, -reduct is proposed in the framework of VPRS, while -reduction are developed within the framework of classical rough set and does not need new rough set model. Second, -reduct was introduced to preserve the sum of objects in -lower approximations of all decision classes, and -reduction aims to keep -lower approximation of every decision class. Third, -reduct cannot keep confidence of reduced rules of some -possible rules not less than as pointed in [24], but a -reduction can avoid this drawback by keeping confidence of every -possible decision rule not less than . At last, in VPRS possible rules with bigger confidence are due to noise, when noise is ignored, these rules are believed as certain ones. However, if these kinds of possible rules are not due to noise but roughness, risk will be ignored when they are applied to practical problems as certain ones. Thus, -reduct does not have the formulism to distinguish noise and roughness. Since -reduction still considers all possible rules as possible ones, it can handle either noise or uncertainty at the meantime. Since a -reduction is a special case of a -local reduction, thus it is obvious that a -reduct and a -local reduction are certainly different. Furthermore, a -local reduction is proposed to capture key attributes for special decision classes, and a -reduct cannot do this work since it has to consider all decision classes at the meantime. Following we first give an example to indicate that -reduct and the -local reduction are really different.

Example 3.4. An inconsistent decision table is given as Table 1.
Let , be the set of all 0.6-reducts, while is the set of all 0.6-local reduction for . Obviously, every 0.6-reduct is not a 0.6-local reduction, and every 0.6-local reduction is not a 0.6-reduct.

Following we study the properties of the -local reduction. The set of all -local reductions in for is denoted by , and we have the following theorem.

Theorem 3.5. .

Proof. (1) For any , holds. Suppose , then there exists a -local reduction for s.t. , such that , we get contradiction, hence , namely .
(2) For any , suppose , then , therefore there exists a -local reduction for s.t. , then , thus , we get contradiction, hence , namely, . From (1), (2), we can prove Theorem 3.5.
According to Theorem 3.5  -local core can be employed as the basis of finding all -local reductions for since it is included in all -local reductions for .
If elements in have nonempty overlaps, then there exists a satisfying elements in have empty overlaps and . We only prove this statement when .

Theorem 3.6. Suppose , and , then .

Proof. For any , , thus we finish the proof.

Following we always assume elements in have empty overlaps. We have the following theorem for the -local core.

Theorem 3.7. and for any , , then we have .

Proof. For any there exists satisfying .
When , , , a -local reduction for is a -reduction. Thus, we get the core of -reduction can be expressed as the union of the cores of -local reductions for . From Theorem 3.7 we can imply elements in the -local core for are indispensable for certain group of decision classes. If we pay more attention to a special group of decision classes, then the -local reduction may offer less conditional attributes only being indispensable for them. This is the objective of -local reductions. Following we study the computing of -local reductions.

Definition 3.8. Let be a DT, , , and . Denoted by as the value of samples in in terms of . Define then is called the -local discernibility matrix for .

From the definition of -local discernibility matrix for we can easily get , namely, can be expressed as the union of . If , and , , then the discernibility matrix for can be obtained by composing discernibility matrices for .

Theorem 3.9. satisfies the following properties.(1)It is a symmetric matrix, that is, .(2) if or holds, specially , .

The proofs of following two theorems are straightforward.

Theorem 3.10. .

Theorem 3.11. includes a -local reduction for if and only if for .

By Theorem 3.10 the -local core is the set of single element of the -local discernibility matrix, thus, we can get from the -local discernibility matrix directly.

Definition 3.12. Let be a DT. A Boolean function is denoted by , , then is referred to the -local discernibility function for .
Let be the reduced disjunctive form of by applying the distribution and absorption laws as many times as possible. Then there exist and for such that , thus, we have the following theorem.

Theorem 3.13. .

The proof of Theorem 3.13 is similar to the one for traditional rough sets in [12].

Following we employ an example to illustrate the idea of -local reduction in this paper.

Example 3.14. When one suffers from a disease, certain symptoms can be observed. The doctor observes patients’ symptoms and signs to implement diagnosis. In the following decision table (as Table 2 shown), ten patients’ symptoms were observed and recorded. We would like to know which symptom is closely related to the influenza
Let and , then for any and for any . Thus, -local discernibility matrices for , and are as follows respectively:
Clearly every in is the union of the corresponding elements in and . Because of , a -local reduction for is a -reduction, in other words, the discernibility matrix of -reduction can be obtained by composing discernibility matrices of -local reductions. We can easily get the corresponding -local reduction for is and . Similarly, and , .
If we pay much more attention to influenza than others, that is, we concentrate on the decision class with the value 1. Then the 0.6-local reduction can be employed by doctor to judge whether a patient is with influenza. However, is a 0.6-reduction and cannot explicitly identify key conditional attributes for particular decision class. On the other hand, a -local reduction keeps confidence of important -possible decision rules extracted from special decision classes not less than . For instance, we have the following 0.6-possible decision rules and their reduced ones related to .
The original 0.6-possible decision rules related to are as follows:(i).(ii).(iii).(iv).
The reduced rules are as follows:().().().
From the above we know that a -local reduction keeps confidence of 0.6-possible decision rules for decision class . Thus, a -local reduction could explicitly identify key conditional attributes for particular decision classes and keeps confidence of -possible decision rules in terms of these decision classes not less than . Therefore, -local reduction can be selected as an effective method to deal with massive data.

4. Algorithm to Find One -Local Reduction and Numerical Experiments

In this section, we develop an algorithm to find a -local reduction. Then we perform numerical experiments for massive data sets to demonstrate that we can reduce the number of condition attributes and keep classification accuracies of raw data with -local reduction, which initially implies that the method proposed in this paper is feasible to process massive data.

4.1. Algorithm to Find One -Local Reduction

In the subsection, we develop an algorithm (Heuristic) to find one -local reduction by the approach of discernibility matrix proposed in Section 3.

Algorithm 4.1. To find one -local reduction for of a certain decision class the following should be carried out:Input: .Output: One -local reduction .Initialize: .Step 1: Compute by Definition 3.8.Step 2: Compute ; and delete those with nonempty overlap with .Step 3: Let .Step 4: Add the element whose frequency of occurrence is maximum in all into ; and delete those with nonempty overlap with .Step 5: If there still exist some , go to Step 4; otherwise, go to Step 6.Step 6: If is not independent, delete the redundant elements in .Step 7: Output .

The computational complexity of this algorithm is . Here is the size of universe, is the number of condition attributes.

4.2. Numerical Experiments

In this subsection, we perform experiments to demonstrate that with -local reduction and -reduction, condition attributes of a massive data set can be reduced with a satisfied parameter . We also employ support vector machine (SVM) as a classifier to compare the classification accuracies of reduced and raw data sets. The experiments are set up as follows.

4.2.1. Experimental Setup

Dataset
Six datasets from University of California, Irvine (UCI) Machine Learning Repository [25] are used (see Table 3).

Classifier
SVM in SVM-KM MATLAB Toolbox is employed as the classifier.

Dataset Split
In the process of classification, 10-fold cross-validation is applied on the six datasets.

Dataset Discretization
The fuzzy C-mean method proposed in [26] is used to discretize real valued condition attributes.

Indices
They are (1) the number of selected attributes in the reduct, (2) classification accuracy of the reduct.

Parameter Specification
From confidences of all decision rules, we randomly choose a confidence which is greater than 0.5 as our experimental parameter on a specific dataset.

Accuracy
The accuracy in this paper is calculated by , while is the number of samples classified correctly in a certain decision class, and B is the number of samples in this decision class.

First Table 4 shows the detailed comparison of condition attributes in -local reduction and -reduction. Clearly -local reductions are different to -reductions. In most cases, -local reduction is subset of a certain -reduction. That is to say, the number of condition attributes in the -local reduction is often smaller than the one in the -reduction.

Next, Table 5 shows that though the average accuracy on -local reduction (i.e., 0.79466) is a little lower than average accuracy on raw data (i.e., 0.8147333), and much higher than average accuracy on -reduction (i.e., 0.75170666). This fact reveals that compared with -reduction and raw data, -local reduction can keep classification accuracy within a small perturbation. It is also easy to see from Table 5 that the number of attributes in the -local reduction (10.33333) is obviously less than the one in -reduction (12.066666) and far less than the one of raw data (20.66666). In particular for the dataset “BreastTissue,” the number of attributes in the -local reduction is far less than the one in the -reduction and raw data set on every decision class.

These results initially imply that idea of -local reduction is effective to deal with some massive data. However, we select different parameter for different data set, and how to select a suitable parameter for certain data set is a complex problem. We omit detailed discussion on this topic in this paper.

5. Conclusion

Attribute reduction is a key topic in rough set theory. And the existing methods of attribute reduction ignore possible rules and cannot capture key condition attribute for special decision classes. In this paper, we develop the concept of -local reduction, by which possible rules with larger confidence are considered and key conditional attributes related to some special decision classes can be selected. Approach of discernibility matrix is employed to find -local reductions. Experiments are performed to demonstrate the effectiveness of the idea of -local reduction in this paper.

Acknowledgment

This paper is supported by a Grant of NSFC (71171080).