Abstract

Attribute reduction is one of the challenging problems facing the effective application of computational intelligence technology for artificial intelligence. Its task is to eliminate dispensable attributes and search for a feature subset that possesses the same classification capacity as that of the original attribute set. To accomplish efficient attribute reduction, many heuristic search algorithms have been developed. Most of them are based on the model that the approximation of all the target concepts associated with a decision system is dividable into that of a single target concept represented by a pair of definable concepts known as lower and upper approximations. This paper proposes a novel model called macroscopic approximation, considering all the target concepts as an indivisible whole to be approximated by rough set boundary region derived from inconsistent tolerance blocks, as well as an efficient approximation framework called positive macroscopic approximation (PMA), addressing macroscopic approximations with respect to a series of attribute subsets. Based on PMA, a fast heuristic search algorithm for attribute reduction in incomplete decision systems is designed and achieves obviously better computational efficiency than other available algorithms, which is also demonstrated by the experimental results.

1. Introduction

Rough set theory (RST) [1] is a powerful mathematical tool for dealing with imprecision, uncertainty, and vagueness. As an extension of traditional set theory supporting approximation in decision making, RST provides a well-established model that the approximation of an indefinable target concept is represented by a pair of definable concepts known as lower and upper approximations. In recent years, more and more attention has been paid to RST, and much success of its applications has already covered a variety of fields such as artificial intelligence, machine learning, and knowledge discovery [26].

Attribute reduction is one of the key topics in RST, viewed as the strongest and most important result to distinguish RST from other theories [7]. Its task is just to eliminate reducible or dispensable attributes and search for a feature subset with the same classification capacity as that of the original attribute set. Much use has been made of attribute reduction as a preprocessing stage prior to classification of decision systems, making analysis algorithms more efficient and learned classifiers more compact.

In attribute reduction, we encounter four general search strategies. The most intuitive one is the exhaustive search, which checks all the possible candidate subsets and retrieves those that satisfy the given criteria. The exhaustive search results in high time complexity and has been proved to be an NP-hard problem [8]. Another alternative way characterized by the incomplete search applies mapping and pruning techniques to minimization. It is achieved by mapping pertinent elements to a structured model and pruning useless branches in the search space [9, 10]. Similar to the exhaustive search, the incomplete search finds the minimal reduct at the expense of great computational effort. The third strategy for attribute reduction conducts a random search using the techniques such as genetic algorithm [11], ant colony optimization [12], and particle swarm optimization [13]. The random search provides a robust solution but is also computationally very expensive. The fourth and most practical strategy discovers feature subsets by the heuristic search, where attributes with high quality are preferred as heuristics according to an evaluation function [1418]. The heuristic search has the ability to seek out an optimal or suboptimal reduct as well as acceptable computational complexity, playing an important role in the attribute reduction community.

To accomplish efficient attribute reduction, much work has been devoted to developing heuristic search algorithms. From the viewpoint of evaluation functions, they can be classified into three main categories: positive region-based reduction [1925], combination-based reduction [2630], and entropy-based reduction [3133].

The positive region-based reduction takes the change of rough set positive region caused by the addition of an attribute as the significance of the attribute. Attributes with the highest significance are selected as heuristics to guide the search process. One of the classic instances is the quick reduct algorithm proposed by Chouchoulas and Shen [19], which can pick the best path to a reduct from the whole search space and receive many improved versions [2022]. By using decomposition and sorting techniques to calculate positive region, Meng and Shi [23] put forward a fast positive region-based algorithm for feature selection from incomplete decision systems. Qian et al. [24, 25] constructed an efficient accelerator for heuristic search using a series of positive regions to approximate a given target decision on the gradually reduced universe. It can be incorporated into heuristic attribute reduction algorithms and make the modified versions capable of greatly reducing computational time.

Unlike the positive region-based reduction, the combination-based reduction considers positive region as well as other available information such as rule support and boundary region. An attribute is evaluated by combined measure generated by positive region and additional information. With consideration of the overall quality of the potential set of rules, Zhang and Yao [27] introduced rule support into the evaluation function and proposed a support-oriented algorithm called parameterized average support heuristic (PASH), which selects features causing high average support of rule over all decision classes. Parthaláin and Shen [28] used distance metric to qualify the objects in the boundary region with regard to their proximity to the lower approximation and presented the distance metric-assisted tolerance rough set attribute reduction algorithm, which employs a new evaluation measure created by combining the distance metric and the dependency degree.

Different from the above two categories, the entropy-based reduction gives the evaluation functions from the information view, such as combination entropy and rough entropy. Qian and Liang [32] presented the concept of combination entropy for describing the uncertainty of information systems and used its condition entropy to select a feature subset. Sun et al. [33] utilized rough entropy-based uncertainty measures to evaluate the roughness and accuracy of knowledge and then constructed a heuristic search algorithm with low computational complexity for attribute reduction in incomplete decision systems.

These investigations have offered interesting insights into attribute reduction. When dealing with large incomplete data, however, they still suffer from computational inefficiency. A more efficient and feasible attribute reduction approach is really desirable. This paper just intends to provide such a solution.

One can observe that most of heuristic attribute reduction algorithms are based on the model that the approximation of all the target concepts from a decision system is dividable into that of a single target concept represented by lower and upper approximations. Little work has hitherto taken the approximation problem into account at the macroscopic level. In this paper, we propose a novel model called macroscopic approximation, considering all the target concepts of a decision system as an indivisible whole to be approximated by rough set boundary region derived from inconsistent blocks, as well as an efficient approximation framework called positive macroscopic approximation (PMA), addressing macroscopic approximations with respect to a series of attribute subsets. Based on PMA, a fast heuristic attribute reduction algorithm for incomplete decision systems is designed and achieves obviously better computational efficiency than other available algorithms, which is also demonstrated by the experimental results.

The remainder of this paper is organized as follows. In Section 2, we review some basic concepts of RST and outline the quick reduct algorithm. Section 3 investigates macroscopic approximation and positive macroscopic approximation. In Section 4, a fast heuristic attribute reduction algorithm based on positive macroscopic approximation is devised and illustrated by a worked example. In Section 5, some experiments are practiced to validate the time efficiency of the proposed algorithm. Finally, we give a concise conclusion in Section 6.

2. Preliminaries

In this section, we briefly recall some basic concepts, such as incomplete decision system, tolerance relation, tolerance block, tolerance class, positive region, and boundary region, together with the quick reduct algorithm, needed in the following sections.

2.1. Basic Concepts

A decision system is an information system with distinction between decision attributes and condition attributes. It is generally formulated by a data table where columns are referred to as attributes and rows as objects of interest. If there exist objects that contain missing data, the decision system is incomplete; otherwise, it is complete.

An incomplete decision system is a tuple , where , called the universe of discourse, is a nonempty finite object set and is an attribute set that consists of a condition attribute subset and a decision attribute subset . For any , there is a mapping , , where is the value domain of . (where “” stands for missing values) and are the value domains of and , respectively. For convenience, the tuple is usually denoted as .

Let be an incomplete decision system. For any subset , determines a binary relation, denoted by , which is defined as

It is easily known that the binary relation is reflexive, symmetric, and intransitive, so it is a tolerance relation. Using -tolerance relation, the universe can be divided into many tolerance blocks such that , where is a -tolerance block and is the family of all the -tolerance blocks on , called a -approximation space. A -tolerance block depicts the collection of objects which are possibly indiscernible from each other with respect to . If there does not exist another -tolerance block such that , then is called a maximal -tolerance block [34].

For any object , -tolerance relation determines the tolerance class of , denoted by , that is, , describing the maximal set of objects which are probably indistinguishable to with respect to . There is a relationship between the tolerance class and the maximal tolerance block, shown as follows [34]: where is the family of maximal -tolerance blocks containing . Then, it is easy to prove that where is the family of -tolerance blocks containing .

Consider a partition of determined by . is the family of all the decision classes derived from the decision system. Each decision class can be viewed as a target concept approximated by a pair of precise concepts which are known as lower and upper approximations. The dual approximations of a target concept are defined, respectively, as The low approximation of is regarded as the maximal -definable set contained in , whereas the upper approximation of is considered as the minimal -definable set containing . If , then is a -exact set; otherwise, it is a -rough set.

By the dual approximations, the universe of the decision system is partitioned into two mutually exclusive crisp regions: positive region and boundary region, defined, respectively, as It can be perceived that the positive region is the collection of objects which are classified without any ambiguity into the target concepts using -tolerance relation, while the boundary region is, in a sense, the undeterminable area of the universe, where none of the objects are classified with certainty into the target concepts as far as is concerned. It is apparent that and . If or , we say that is consistent; otherwise, it is inconsistent.

2.2. Rough Set Attribute Reduction

In RST environment, dependency degree of decision attribute set to condition attribute set is definable in terms of positive region: where and are the cardinalities of and , respectively. If , we say that totally depends on . If , we say that partially depends on . If , we say that is completely independent of . RST describes a variation of dependency degree caused by the addition of attribute to as the significance of such that The bigger the significance value is, the more informative the attribute is. Accordingly, the quick reduct algorithm [19], regarded as a classical rough set attribute reduction algorithm, is constructed by iteratively adding the attribute with the highest significance to an attribute pool which begins with an empty set until the dependency value of the pool equals that of the set of the whole condition attributes. This process can be outlined by Algorithm 1.

Inputs:
: the universe
: , a condition attribute set
: , a decision attribute set
Output:
red: a reduct of
Begin
Step  1.  Let red and
Step  2.  While and do
     Select an attribute with Sig where
     
     
Step  3. Output
End.

3. Positive Macroscopic Approximation

It is well known that approximation is one of the core ideas of RST. A target concept is approximated by the low and upper approximations. Likewise, a decision system can be viewed as a super concept to be approximated by the low and upper super approximations, where the family of all the target concepts from the decision system is considered as the super concept, the positive region or the complementary set of the boundary region of the decision system acts as the low super approximation, and the universe of the decision system serves as the upper super approximation. If the low super approximation equals the upper super approximation, the decision system is consistent or exact; otherwise, it is inconsistent or rough. From the observation, it is easily understood that the approximation of a decision system can be represented by the positive region or the boundary region.

RST offers a feasible way to obtain the approximation of a decision system by means of that of a single target concept represented by lower and upper approximations as stated by (5) and (6). For convenience, this model is called microcosmic approximation. As opposed to microcosmic approximation, this section introduces a novel model called macroscopic approximation, where the approximation of a decision system is achieved by regarding all the target concepts as an inseparable entity to be approximated by the boundary region. Furthermore, we explore positive macroscopic approximation (PMA), which considers macroscopic approximations with respect to a series of attribute sets.

3.1. Macroscopic Approximation

Macrocosmic approximation is an alternative way to arrive at the approximation of a decision system by the boundary region. Due to an integral consideration of all the target concepts associated with a decision system, the low and upper approximations are unavailable. Affirmably, an attempt is deserved to pioneer a new avenue to macroscopic approximation. An inconsistent tolerance block, introduced in the following context, is capable of offering such a feasible solution.

Definition 1. Let , , and . Then, is said an inconsistent tolerance block (IT-block) if or a consistent tolerance block (CT-block), where and is the cardinality of .

An IT-block describes a set of -definable objects with diverse class labels, implying that a group of -indistinguishable objects have the divergence of decision making, whereas a CT-block depicts a collection of -definable objects with the same class label, indicating that a group of -indiscernible objects share the same decision making. Accordingly, are classified into two mutually exclusive crisp subfamilies. One is the consistent family, denoted by , collecting all the CT-blocks from such that . The other is the inconsistent family, denoted by , gathering all the IT-blocks from such that . Obviously, and .

It is worth noting the distinction between the boundary region and the IT-block. The former consists of objects of which tolerance classes cannot be entirely contained in target concepts, while the latter is such an entity that overlaps two or more target concepts. The following lemmas are used to investigate the relationship between them.

Lemma 2. Let and . For any , there must exist at least one IT-block containing .

Proof. This proof is done by contradiction. For any , suppose that there does not exist any IT-block containing . That is to say, any is a CT-block, which implies that there must exist some decision value () such that . corresponds to a certain decision class () containing all the objects whose decision values are equal to , and then and . So, . This means that , contradicted with . Hence, for any , there must exist at least one IT-block containing . The lemma holds.

Lemma 3. Let and . For any , .

Proof. For any , we have , where . Since , there is for any (or if , then , and thereby , contradicted with ). Hence, , which gives . As and in , implies that . Therefore, for any and , there is . This means that for any . The lemma holds.

Theorem 4. Let and . Then, .

Proof. For any , according to Lemma 2, there must exist at least one tolerance block containing such that , which yields . On the other hand, for any , by Lemma 3, we have , which gives . In summary, . The theorem holds.

Theorem 4 reveals the relationship that the boundary region is the union of IT-blocks, which is in essence the materialization of macroscopic approximation by integrally approximating all the target concepts using the boundary region derived from IT-blocks. Unlike the microscopic approximation stated by (6), the macroscopic approximation is directly constructed on the elementary members of the approximation space rather than the low and upper approximations, making the calculation of the approximation of the decision system more efficient.

3.2. PMA

For a given decision system, we can build a series of attribute subsets according to the following rule. The first set has only one attribute selected from the set of the whole attributes, the union of the first set and another attribute chosen from the remaining attributes is regarded as the second set, and so on. The newly generated attribute subsets form an ascending order of the sequence, called a positive sequence. If a sequence is organized by a descending order, it is called a converse sequence. Assume that attribute subsets and are two arbitrary elements of a positive sequence. If contains , we say that the tolerance relation determined by is finer than that determined by . Conversely, if is contained in , we say that the tolerance relation determined by is coarser than that determined by . Accordingly, a positive sequence determines a train of tolerance relations stretching from coarse to fine.

In this subsection, we explore positive macroscopic approximation (PMA), which addresses macroscopic approximations with respect to a positive sequence. To construct an efficient PMA, the following relevant definitions and lemmas are needed.

Definition 5. Let , , , and . Then, the family of subblocks determined by on is defined as The consistent and inconsistent subfamilies of are defined as and , respectively. Apparently, and .

Lemma 6. Let , , , and a CT-block. Then, any is a CT-block and , .

Proof. Since is a CT-block, . Suppose that (), then for any . means that , so we have for any and . Hence, and , which yields that is a CT-block. Moreover, and . The lemma holds.

This lemma shows that any subblock derived from a CT-block is also a CT-block. In other words, inconsistent tolerance subblocks are derivable only from IT-blocks.

Definition 7. Let , , , and a family of -tolerance blocks. Then, the family of subblocks determined by on is defined as The consistent and inconsistent subfamilies of are defined as and , respectively. Clearly, and .

Lemma 8. Let , , , and a family of CT-blocks. Then, any is a CT-block and , .

Proof. For any , is a CT-block. By Lemma 6, any is also a CT-block, which yields that any is a CT-block. Since , and . The lemma holds.

Theorem 9. Let , , and . Then, .

Proof. Assume that . By (9) and (10), we have and . Then, . By Lemma 8, , and then . The theorem holds.

Theorem 9 indicates that can be deduced by , which implies that the IT-blocks determined by the finer tolerance relation can be achieved by successively decomposing the IT-blocks determined by the coarser tolerance relation, regardless of corresponding CT-blocks. By Theorem 9, we arrive at an efficient PMA, embodied by Definition 10.

Definition 10. Let , with , and a positive sequence, where , ,…, and . Then, PMA with respect to can be defined as where , , and .

PMA is in fact a sequence of boundary regions, each of which denotes the approximation of the decision system with respect to some attribute subset. Since the tolerance relations determined by the positive sequence are from coarse to fine, it is easily proved that corresponding boundary regions stretch from broad to narrow. In other words, PMA is the sequence of gradually reduced boundary regions. When , PMA portrays in detail the evolution of boundary region becoming narrower and narrower until reaching the boundary region with respect to the set of the whole condition attributes.

PMA provides an efficient approximation framework where a decision system is consecutively approximated by the boundary region derived from IT-blocks according to a positive sequence. This mechanism can be visualized in Figure 1.

PMA considers the universe as a root IT-block and evaluates the boundary regions by repeatedly dividing the IT-blocks into the smaller ones with the positive sequence. For the attribute set , the attribute works on the root IT-block and then outputs the consistent family and the inconsistent family . The former is pruned, while the latter is used to produce the boundary region and serves as father IT-blocks for next operation. As for , and are generated by employing to operate on , and then is derived from . Likewise, we can obtain and calculated by using to split along with induced by . The detailed steps of this process are shown in Algorithm 2.

Inputs:
: the universe
, a condition attribute subset
, a decision attribute set
Output:
: a family of boundary regions
Begin
Sept  1. ,    // regard the universe as a root IT-block
Step  2. For to do
            // generate an element of a positive sequence
           // derive IT-blocks with respect to from those with respect to
      // obtain the boundary region by means of IT-blocks
       // add the boundary region to the sequence
Step  3. Output
End.

PMA starts with an empty sequence; boundary regions with respect to a positive sequence are added incrementally. This process continues until all the attributes are traversed. For each loop, the boundary region is derivable from the corresponding IT-blocks obtained by operating on their predecessors with a single attribute.

There are several highlights decorating PMA. First, the inconsistent family with respect to some attribute subsets inherits from its predecessor rather than starting all over again, which makes efficient use of the intermediate results. Second, the cardinality of the boundary region with respect to a fine tolerance relation is not more than the cardinality of the boundary region with respect to a coarse tolerance relation, indicating that PMA works on a gradually reduced universe. Finally, obtaining a series of boundary regions needs to traverse the entire condition attribute set just only once. All the advantages are beneficial for PMA to achieve better computational efficiency, which can be visible in the process of attribute reduction addressed in the next section. The following example is employed to illustrate this idea.

Example 11. Consider an incomplete decision system showed in Table 1, where , , and . The attributes , and stand for price, mileage, size, max-speed, and acceleration, respectively. Note that Table 1 is somewhat different from that in the literature [23, 34].

Let . We can build a positive sequence such that , where , , , and . Applying to the universe, we can get and then : For , the attribute is used to operate on for . The corresponding results are generated: Similarly, the results with respect to and are produced as follows: Therefore, PMA with respect to is achieved: It is clear that , which confirms the fact that the boundary region of PMA is gradually reduced in accordance with the positive sequence and finally catches up to the minimal boundary region ().

4. PMA-Based Attribute Reduction

As mentioned previously, PMA offers a sequence of the boundary regions in descending order. If each selected attribute is so informative that it makes the boundary region narrowed remarkably, the boundary region with respect to some attribute subsets can keep up with the boundary region with respect to the set of all the attributes. In other words, a reduct is the attribute subset with the minimal number that creates the same approximation of the decision system as the original attribute set. Following the observation, we design a heuristic attribute reduction algorithm based on PMA, called PMA-AR. Before elaborating PMA-AR, we give a redefinition of the attribute dependency.

In (7), the dependency degree is definable in terms of positive region. Since positive region and boundary region are complementary within the universe of a decision system, the dependency degree can also be defined in terms of boundary region.

Definition 12. Let , , and , and the dependency degree of decision attribute set to condition attribute set is defined as Then, the significance of the attribute is also equivalently redefined as The significance expresses how the boundary region will be affected if the attribute is added to the set . In general, an attribute with a maximal significance value is preferentially selected to guide the search process.

PMA-AR is in essence an extension of the quick reduct algorithm indicated previously. It marries PMA and the boundary region-based significance. The former provides an efficient way to compute the boundary region, and the latter acts as a router to determine the optimal search path. This effective combination allows PMA-AR to have the ability to locate a reduct efficiently. Algorithm 3 gives the detailed description of PMA-AR.

Inputs:
: the universe
, a condition attribute set
, a decision attribute set
Output:
red: a reduct of
Begin
Sept  1.
Sept  2. Compute by Algorithm 2
Step  3. While and do {
    
    For to do {
     // denotes the No. attribute in
     
     
     If then {
        
        
     } //end If
    }//end For
   
   }//end While
Step  4. Let

PMA-AR is constructed on PMA and employs the boundary region-based significance as the evaluation function to determine the positive sequence. A new attribute subset is achieved by this way. Each of the unselected attributes is used to work on the IT-blocks generated by the current attribute subset, and then corresponding boundary region is derived from the resulted IT-blocks. By evaluating the boundary region-based significance, the attribute with the biggest significance value is selected and added to the current attribute subset, which creates the expected attribute subset. This process continues until the boundary region with respect to the newly generated attribute subset equals the boundary region with respect to the set of all the attributes; equivalently, the dependency degree of the decision attribute set to the newly generated attribute subset is equal to that of the decision attribute set to the original attribute set.

PMA-AR produces the shortest positive sequence together with the fastest evolution of the boundary regions from maximal to minimal. A hidden reduct is uncovered which makes use of the minimal attributes to describe the approximation of the decision system with respect to the set of all the attributes.

Note that the time complexity of PMA-AR is , which is dependent on Step 2 with and Step 3 with , where and are the cardinalities of and , respectively. In the worst case where and , it is theoretically possible that PMA-AR takes time . In fact, according to Algorithm 3, if or , PMA-AR costs time , which implies that the practical time consumption is much less than . Compared with the existing attribute reduction algorithms for incomplete decision systems with time complexities not less than [23, 33], PMA-AR achieves obviously lower time complexity. The following example is employed to illustrate the working process of PMA-AR.

Example 13. Consider described in Table 1, where , , and . The attributes , and stand for price, mileage, size, max-speed, and acceleration, respectively.

Unlike Example 11, the positive sequence associated with PMA-AR is dynamically generated by iteratively selecting the attribute with the maximal significance value. To this end, the following results are produced:

It is evident that the attribute with the maximal significance value is available and can be used to create the attribute subset . Then, and . Now, is regarded as father IT-blocks to breed son IT-blocks by adding one of the remaining attributes to . From the newly generated IT-blocks, the boundary region is derivable. By evaluating the boundary region-based significance, the next expected attribute can be selected from the set of remaining attributes . The corresponding results are exhibited as follows:

Thus, the attribute with the maximal significance value is preferably selected and used to generate another attribute subset . Then, and . Since is equal to , is just a reduct.

As a result, the reduct of the decision system is .

5. Experimental Evaluation

In the following, we carry out several experiments on a personal computer with Windows XP, 2.53 GHZ CPU, and 2.0 G memory so as to evaluate PMA-AR in terms of the number of selected features and running time.

There are many heuristic search algorithms for attribute reduction in incomplete decision systems [17, 20, 2325, 28, 33], of which three state-of-the-art algorithms are appropriate for comparison with PMA-AR. They are positive region-based algorithm (PRA) [23], distance metric-assisted algorithm (DMA) [28], and rough entropy-based algorithm (REA) [33], qualified as the representatives of positive region based reduction, combination-based reduction, and entropy-based reduction, respectively.

Our experiments employ eight publicly accessible datasets from UCI repository of machine learning databases [35]. Each of them is a discrete dataset with only one decision attribute. Since PMA-AR is designed to deal with incomplete data, five complete datasets, such as balance scale weight and distance, tic-tac-toe end game, car evaluation, chess end game, and nursery, are all turned into incomplete ones by randomly replacing some known attribute values with missing ones. In addition, an identifier attribute of standardized audiology is removed. The characteristics of these datasets are described in Table 2.

The experiments are performed by applying the four algorithms (PRA, DMA, REA, and PMA-AR) to the eight datasets shown in Table 2. The resulted number of selected features and running time expressed in seconds is exhibited in Tables 3 and 4, respectively.

From Table 3, it can be observed that the number of features selected by PMA-AR is the same as that by PRA but not less than those by DMA and REA. On the whole, the numbers by the four algorithms are relatively approximate. This indicates that the performances of the four algorithms are very close, though DMA and REA perform a little better than PMA-AR and PRA.

On the other hand, Table 4 shows that for each dataset, DMA needs the most time, PRA and REA get the second and third place, respectively, and PMA-AR need the lest. Moreover, the running time of DMA, PRA, and REA increases much more rapidly than that of PMA-AR. The differences can be illustrated by plotting the ratios of DMA, PRA, and REA to PMA-AR, respectively, as shown in Figure 2.

From Figure 2, we can find that the curve corresponding to DMA/PMA-AR increases most rapidly, and the curve corresponding to PRA/PMA-AR increases slightly rapidly than that corresponding to REA/PMA-AR. The experimental result coincides with the theoretical analysis that the time complexity of DMA is not less than , which is the highest among all four algorithms; the second one is PRA, of which the time complexity is . Following PRA, REA has the time complexity not more than , and the most efficient one is PMA-AR with the time complexity much less than . It is verified that PMA-AR achieves the best performance in terms of time efficiency.

One can also observe that although each curve tends to increase with size of datasets, it is not strictly monotonic, namely, the curves fluctuate significantly. This can be seen from the case that the ratio of DMA to PMA-AR on Dataset 2 is higher than that on Dataset 3. The main reason is that the attribute number of datasets is different, and more importantly, the number of selected features is also different. For example, Dataset 2 has 69 attributes, of which 21 features are selected, while Dataset 3 has 16 attributes, and 8 features are selected. Furthermore, the curves also indicate that when the number of attributes is far less than that of objects, the running time mainly relies on the latter. This supports the conclusion that PMA-AR is more suitable for feature selection from large data than the other three algorithms because it is proportional to .

6. Conclusions

Attribute reduction in incomplete decision systems has been a hotspot of rough set-based data analysis. To efficiently obtain a feature subset, many heuristic attribute reduction algorithms have been studied. Unfortunately, these algorithms are still computationally costly for large data. This paper has developed PMA-AR, which has the ability to find a feature subset as well as obviously better computational efficiency.

Unlike other algorithms featured by microcosmic approximation, PMA-AR adopts a novel model called macrocosmic approximation, which considers all the target concepts of a decision system as an indivisible whole to be approximated by rough set boundary region derived from IT-blocks. Constructed on PMA which serves as an accelerator for calculation of boundary region, PMA-AR is capable of efficiently identifying a reduct by using the boundary region-based significance as the evaluation function. Both theoretical analysis and experimental results demonstrate that PMA-AR indeed outperforms other available algorithms with regard to time efficiency.

Acknowledgment

This work is supported by the Specialized Research Fund for the Doctoral Program of Higher Education of China (Grant no. 20090002110085).