Abstract

The key problem for attribute reduction to information systems is how to evaluate the importance of an attribute. The algorithms are challenged by the variety of data forms in information system. Based on rough sets theory we present a new approach to attribute reduction for incomplete information systems and fuzzy valued information systems. In order to evaluate the importance of an attribute effectively, a novel algorithm with rigorous theorem is proposed. Experiments show the effect of proposed algorithm.

1. Introduction

There are various kinds of data in information systems, such as real value attribute, symbol attribute, and fuzzy sets attribute. The variety of data forms in information system gives a challenge to attribute algorithms. How to evaluate the importance of an attribute is a key problem for attribute reduction, and this also is hot point of study for researchers [16]. In [7] fuzzy discernibility matrix was studied for depicting the relationship for attributes in information system. Chen et al. [2] argued that the heuristic algorithm in [7] usually cannot find a proper reduct but an overcut-reduct or subreduct due to their stop criteria; based on the minimal elements in discernibility matrix, they proposed an attribute reduction algorithm with fuzzy rough sets. In [8], fuzzy rough sets attribute reduction was approached by a new construction of discernibility matrix. Rough set provides an important tool for data mining and knowledge discovery, and its application results attract more and more attention [1, 915]. Attribute reduction for information system by using discernibility matrix is an important application of rough set theorem. The concept and structure of the traditional discernibility matrix are based on attribute and whose attribute values are symbol [3, 10], which are also research hotspots in rough set’s theorem and application field. However, it is difficult to apply the traditional discernibility matrix in incomplete information system, the real value attribute information system, and fuzzy information systems. Although extended rough set model and continuous attributes real value segmentation method are proposed [2, 13, 16, 17], there are also many problems related to traditional discernibility matrix. In particular, the capacity of reduction is greatly reduced because the threshold which decides equivalence of discernibility matrix is artificial. In this paper, a new concept of soft discernibility matrix is proposed, and related theorems as well as examples are given. We could make attributes reduction for incomplete information system, fuzzy information systems by soft discernibility matrix. Experiments show the effect of proposed algorithm.

2. Basic Concepts

Let be an information system, where a universe, is attribute set, is attribute value set, and is the mapping of . For convenient explanation, we assume that attribute sets include condition attributes and one decision attribute (with values ). Let be a partition equivalence class derived by decision attribute in which

Each and its attribute value are called information rule in information system; in this paper, we call it rule for short. The attribute value of an attribute ( or ) under rule of is denoted by .

If each attribute value in information systems is known, the system is called complete information system. In some cases, the information system is called incomplete information system in which some attribute values are unknown. Generally, attribute value set is a symbol set in information system . The system is called fuzzy attribute information system when attribute set is fuzzy set. The system is real value information system or continuous attribute information systems when attribute value set is real value.

Based on rough sets [9], we give the following Definitions 14.

Definition 1. Let be subset of universe and , the lower approximation value of about is and is the set which is made up of equivalence class elements in and equivalence relation .

Definition 2. Let be a universe, and are equivalence relation clusters in , and the positive universe of is defined as .

Definition 3. For , approximation accuracy of partition is where is cardinal number of set.

Definition 4. For , if and there does not exist a being and , is called one attribute reduction of .
The whole intersection set of attribute reduction of is called core of , which is denoted as .

Definition 5 (see [10]). Given an information system , the elements of discernibility matrix are where is empty set.
Paper [10] pointed out that if an element in discernibility matrix includes only one attribute, this attribute is core.

3. Soft Discernibility Matrix

3.1. Probability Discernibility Matrix

Suppose that is incomplete information system, is attribute value set, represents uncertain attribute value in information system, and is the total number of types of attribute value in the system. The matrix which is constructed by incomplete information system is called the probability discernibility matrix.

Definition 6. Given incomplete information system , we denote If one of and is , If both and are , The elements in probability discernibility matrix are where is empty set.
In probability differential matrix, is an element set which is composed of the attribute symbols being with coefficient or weight.
Given an attribute , for any two rules and , if and are known (namely, being certain value or symbol), we denote ;if one of and is , we denote ; if both and are , we denote .

Definition 7. Given incomplete information system , the similarity of rule and rule in the set of is defined as follows.
For every attribute, satisfy or is empty set; let ; otherwise let where , .
Suppose the subscript set of which belongs to one partition of derived from equivalence class of is ; if , we denote . If , we let .

Definition 8. For and a partition , the approximate accuracy of can be defined as follows: where .

Definition 9. For , if and there does not exist satisfying , then is called one attribute reduction of . All the intersection of attribute reduction of is called core of and is denoted as .
In probability discernibility matrix, if there is only one attribute in an element, the included attribute is called single attribute. If there are two or more attributes, the included attributes are called multiattribute items.

Theorem 10. The single attribute whose coefficient in probability discernibility matrix is 1 is core attribute of corresponding incomplete information system.

Proof. In probability discernibility matrix, single attribute is the attribute that under which two different rules (with different decision attribute value) have different attribute value but have the same attribute value under the other condition attributes. We can conclude from Definitions 6 and 8 that if single attributes are deleted, must be reduced. And then according to Definition 9, we can conclude that the attribute must be core attribute. So the theorem is correct.

In information system, the importance of one attribute could be determined by its impact to approximate accuracy of corresponding information systems. The attribute which has larger impact on approximate accuracy is more important than that one which has smaller impact. The attributes which could be reduced are called unimportant attribute.

Theorem 11. In probability discernibility matrix,(1)if two attributes are single attribute, the attribute whose sum of the coefficients is larger is more important;(2)if an attribute is always being with another attribute whose coefficient is 1 in same element set then the attribute must be unimportant.

Proof. (1) If two attributes are single attribute, from Definitions 7 and 8, we can see that their impact on is determined by 6 the sum of coefficient in the single attribute.
The larger the sum of coefficient is, the more impact on is, so it is more important.
(2) If an attribute is always being with another attribute whose coefficient is 1 in same element set then the attribute is certainly or possibly with different values under different rules. We can conclude form Definition 7 that , so . We know that the attribute has no impact on according to Definitions 7 and 8, so this attribute is unimportant.

3.2. Fuzzy Discernibility Matrix

For convenience of explanation, we suppose that each attribute has single value for each rule in the fuzzy information system.

For fuzzy attribute information systems, the concept of equivalence class is no longer applicable, so traditional method could not be used to construct discernibility matrix. In fact, equivalence class represents a correlationship. If the correlation is seen as similarity, then the equivalence can be seen as the association with the value of 1; therefore, the equivalence is a special case of similarity. Each rule is viewed as one fuzzy set constructed by each condition attribute value when we are constructing fuzzy discernibility matrix and investigate similarities between fuzzy set.

Definition 12. Given fuzzy information system , the element in fuzzy discernibility matrix is defined as follows: where . is fuzzy similarity of and in the case that . is fuzzy similarity in the case that attribute is deleted.

From Definition 12 we can conclude that each element in fuzzy discernibility matrix is condition attribute set, and it corresponds to a fuzzy set .

Here we denote as the sum of coefficients of .

In paper [16], the authors give a similarity measure formula. Suppose that , are two fuzzy set; then similarity measure base on inclusion degree is and similarity measure base on close degree is

Given a fuzzy attribute information system , the similarity of rule and is denoted as . For each derived from equivalence class of , let the subscript set of be ; if , we denote

Definition 13. For , approximation accuracy of partition is where . is called approximate accuracy of fuzzy information system.

Definition 14. An attribute satisfying is called core attribute. The core attribute of system denotes .
As to attribute reduction, the importance of an attribute depends on its impact on approximate accuracy .

Theorem 15. An attribute with larger sum of coefficient in all the elements of fuzzy discernibility matrix is more important.

Proof. We can conclude form Definition 12 that the sum of coefficients in all elements can be described as where From Definitions 13 and 14, we can conclude that the size of approximate accuracy depends on the size of . The attribute with larger coefficient has more impact on , so it is more important.

Theorem 16. In fuzzy discernibility matrix, for an attribute, if sum of coefficient in all elements is larger than 0, then the attribute is core attribute.

Proof. We can conclude the proof of Theorem 10 that could turns larger if is reduced in the case of , and then is core attribute.

We can make attribute reduction in fuzzy information system by use of Definitions 1214 and Theorems 1516.

4. Attribute Reduction

To illustrate the application of soft discernibility matrix, we give two application examples of attribute reduction in incomplete information and fuzzy information system.

4.1. Reduction for Incomplete Information System

The method utilizing probability discernibility matrix for attribute reduction in incomplete information is as follows.

The first step of attribute reduction on incomplete information system is constructing corresponding discernibility matrix according to Definition 6 and then finding out core attribute and unimportant attribute according to Theorems 10 and 11. The core attributes are retained and the unimportant attributes are deleted. As to other attributes, the most important attributes are joined into reduction set based on their impact on approximate accuracy continuously.

Table 1 is incomplete information system, in which , , , and are condition attribute and is decision attribute.

In Table 1, the values of attribute are 0 or 1, . We can obtain the elements of probability discernibility matrix based on Definition 6: where is empty set, .

Because the discernibility matrix is symmetrical, so we only give out lower triangular elements of discernibility matrix . We can see that , , and are all single attribute according to probability discernibility matrix , so they are core attributes. The approximate accuracy of incomplete information system condition attribute in table is . The approximate accuracy remains unchanged, even though attribute is deleted. So the incomplete information system in Table 1 can be reduced as , , and .

We can realize attribute reduction for fuzzy information system based on fuzzy discernibility matrix according to definitions and theorems in Section 3.2. Table 2 is a fuzzy attribute information system, in which , , and are condition attributes and is decided attribute.

4.2. Reduction for Fuzzy Information System

Given a fuzzy information system, at first similarity, between rules and is computed, and then fuzzy discernibility matrix could be determined based on Definition 12. Based on , we construct a new discernibility matrix according to following step.

Initiate the elements of to be null sets.

For each element, Compute If , the corresponding attribute is joined into element .

The new matrix has the traditional form of discernibility matrix. At last, attribute reduction of fuzzy information system could be obtained by discernibility function which is composed of logical computation among elements.

The discernibility matrix (lower triangle) could be computed based on similarity measure of closeness degree. The following elements of discernibility matrix is obtained from the fuzzy attribute system Table 2 ( is empty set): We can conclude that attributes , are single attributes, so , are core attributes.

The discernibility matrix could be reduced as based on difference function .

5. Conclusion

In this paper, a new concept of soft discernibility matrix is proposed; related theorems as well as corresponding application are given. Because soft discernibility matrix could be used in the attribute reduction of incomplete information system, fuzzy information system, it could provide important research basis for attribute reduction of real attribute information system; the method proposed in this paper is of wide application significance in practice. The new concept and related method for attribute reduction are a development to the study for information system and data mining.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.