Abstract

In recent years, the theory of decision-theoretic rough set and its applications have been studied, including the attribute reduction problem. However, most researchers only focus on decision cost instead of test cost. In this paper, we study the attribute reduction problem with both types of costs in decision-theoretic rough set models. A new definition of attribute reduct is given, and the attribute reduction is formulated as an optimization problem, which aims to minimize the total cost of classification. Then both backtracking and heuristic algorithms to the new problem are proposed. The algorithms are tested on four UCI (University of California, Irvine) datasets. Experimental results manifest the efficiency and the effectiveness of both algorithms. This study provides a new insight into the attribute reduction problem in decision-theoretic rough set models.

1. Introduction

We are involved in decision making all the time. Most of the decisions are based on a group of criteria. In this case, decision making is often aimed at finding a proper balance or tradeoff among multiple criteria. There are a series of methods for analyzing multicriteria decision making, such as game theory. Game theory is an effective mathematical method for formulating decision problems as competition between several entities [1]. These entities, or players, aspire to either achieve a dominant position over the other players or cooperate with each other in order to find a position that benefits all [2]. Researchers have accumulated a vast literature on game theory and its applications. For example, recent advances in the study of evolutionary games are reviewed in [35], and some strategies in the spatial ultimatum game are discussed in [6, 7], and so on. However, most of these studies do not consider attribute reduction, which can significantly reduce the computation complexity.

Different from the works mentioned above, in rough set theory, attribute reduction is an important concept. It supports the wide applications of rough sets. Moreover, classical rough sets [810] and their extensions [1115] can be used in conflict analysis [16], a field related to decision making and game theory. Decision-theoretic rough sets (DTRS) [12, 13] may be particularly relevant to decision making and benefit from some new insights provided by game theory. In the rough set theory, a concept is usually described by three classification regions: positive region, boundary region, and negative region. The three regions in DTRS are systematically calculated based on a set of loss functions according to Bayesian decision procedure. The loss functions can be interpreted based on practical notions of costs and risks. In DTRS models, an object is classified into a particular region because the cost of classifying it into the region is less than that of classifying it into other regions. The expected cost of classifying a set of objects is called decision cost.

Generally speaking, attribute reduction can be interpreted as a process of finding the minimal set of attributes that can preserve or improve one or several criteria. The minimal set of attributes is called an attribute reduct. Some researchers have investigated the attribute reduction problem in DTRS models. Most of them addressed the problem based on the preservation or extension of the positive region or the nonnegative region [1719]. However, for DTRS, the regions are nonmonotonic with respect to the set inclusion of attributes [1820], so it is difficult to evaluate and interpret region-based attribute reduction. To tackle the problem, minimal-decision-cost attribute reduction was discussed in [20]. However, most existing studies of attribute reduction in DTRS only concern decision costs but not test costs.

Test cost is the time, money, or other resources one pays for obtaining a data item of an object. Most of the existing attribute reduction problems assume that the data are already stored in datasets and available without charge. However, data are often not free in reality. Recently, the topic of test costs has drawn our attention due to its broad applications. According to the data models constructed in [21], the issues of test-cost-sensitive attribute reduction have been studied based on classical rough sets [22, 23], neighborhood rough sets [24], covering rough sets [25, 26], and so forth. In these works, both backtracking and heuristic algorithms have been implemented through an open source software Coser [27]. Unfortunately, few works have addressed attribute reduction with test cost in the context of DTRS.

In this paper, we study the cost-sensitive attribute reduction problem for DTRS through considering the tradeoff between test costs and decision costs, which is remarkably related to decision making and game theory. Since the purpose of decisions making is to minimize the cost, the process of attribute reduction should help in minimizing the total cost, namely, the summation of test cost and decision cost. A decreasing average-total-cost attribute reduct is defined, which ensures that the total cost will be decreased or unchanged for decisions making by using the reduct. In view of this, a minimal average-total-cost reduct (MACR) in DTRS models is introduced. An optimization problem is constructed in order to minimize the average total cost. It is a generalization of the minimal-decision-cost attribute reduction problem discussed in [20].

Both backtracking and heuristic algorithms are proposed to deal with the new attribute reduction problem. The backtracking algorithm is designed to find an optimal reduct for small datasets. However, for large datasets, it is not easy to find a minimal cost attribute subset. Therefore, we propose a heuristic algorithm to deal with this problem. To study the performance of both algorithms, experiments are undertaken on four datasets from the UCI library [28] through the software Coser. Experimental results show that the efficiency of the backtracking algorithm is acceptable, especially when the loss functions are not much more than the test costs, while the heuristic algorithm is rather efficient, and it can generate a minimal total cost reduct in most cases. Even if the reduct is not optimal sometimes, it is still acceptable from a statistical perspective. Moreover, both algorithms perform well on classification accuracy with CART and RBF-kernel SVM classifiers. Meanwhile, the number of selected attributes is effectively reduced by the two algorithms.

The rest of the paper is organized as follows. In Section 2, we review the main ideas of DTRS. Section 3 gives a detailed explanation of the minimal-total-cost attribute reduction in DTRS models. An optimization problem is proposed. In Section 4, we present a backtracking algorithm and a heuristic algorithm to address the optimization problem. Experimental settings and results are discussed in Section 5. Section 6 concludes and suggests further research trends.

2. Decision-Theoretic Rough Set Models

In this section, we review some basic notions of DTRS model [12, 13, 17], which presents a theoretical basis for our method.

Definition 1. A decision system (DS) is the 5-tuple: where is a finite nonempty set of objects called the universe, is the set of conditional attributes, is the set of decision attributes with only discrete values, is the set of values for each , and is an information function for each .

In a decision system, given a set of conditional attributes , the equivalence class of an object with respect to , namely, , is denoted by or , if it is understood. In DTRS models, the set of states indicates that an object is in a decision class and not in , respectively. The probabilities for these two complement states can be denoted by and . With respect to the three regions: positive region , boundary region , and negative region , the set of actions regarding the state is given by , where represent the three actions of classifying an object into the three regions, respectively. Let , and denote the cost incurred for taking actions , and , respectively, when an object belongs to , and , , and denote the cost incurred for taking the same actions when the object does not belong to . The loss functions regarding the states and can be expressed as a matrix given in Table 1.

Based on the loss functions, the expected costs of taking different actions for objects in can be expressed as The Bayesian decision procedure leads to the following minimal-risk decision rules:(P)If and , decide ;(B)If and , decide ;(N)If and , decide .

Consider a special kind of loss functions with That is, the cost of classifying an object belonging to into the positive region is less than or equal to the cost of classifying into the boundary region , and both of these costs are strictly less than the cost of classifying into the negative region . The reverse order of costs is used for classifying an object that does not belong to . The decision rules can be reexpressed as follows:(P)If and , decide ;(B)If and , decide ;(N)If and , decide ,where the parameters , , and are defined as When we have . After tie-breaking, the simplified rules are obtained as follows:(P1)If , decide ;(B1)If , decide ;(N1)If , decide .

Let denote the partition of the universe induced by . Based on the thresholds , one can divide the universe into three regions of the decision partition : where .

Let ; the Bayesian expected cost of positive rule, boundary rule, and negative rule can be expressed, respectively, as follows:

3. Minimal-Total-Cost Attribute Reduction in Decision-Theoretic Rough Set Models

In this section, we focus on cost-sensitive attribute reduction based on test costs and decision costs in DTRS models. The objective of attribute reduction is to minimize the total cost through considering a tradeoff between test costs and decision costs. Minimizing the total cost is equal to minimizing the average total cost (ATC), so we study the minimal average-total-cost reduct problem.

Test cost is intrinsic to data. There are a number of test-cost-sensitive decision systems. A corresponding hierarchy consisting of six models was proposed in [21]. Here, we consider only the test-cost-independent decision system, which is the simplest though most widely used model.

Definition 2 (see [21]). A test-cost-independent decision system (TCI-DS) is the 6-tuple: where , , , , have the same meanings as in a DS and is the test cost function. Test costs are independent of one another; that is, for any .

By introducing test cost into DTRS models, we can obtain the following definition.

Definition 3. A test-cost-independent cost-sensitive decision system in DTRS models (DTRS-TCI-CDS) is the 7-tuple: where have the same meanings as in Definition 2 and is the loss function matrix listed in Table 1, where and .

An example of DTRS-TCI-CDS is given in Tables 2, 3, and 4. From Table 2 to Table 4, there are a decision system where and , a corresponding test cost vector, and a corresponding loss function matrix, respectively.

For a given DTRS-TCI-CDS, ; the decision cost is composed of the three types of cost formulated in (7), so the decision cost can be expressed as where . According to (6), we can rewrite the decision cost formulation as

Obviously, we can obtain the average decision cost as follows: Because the test cost of any object is the same for the test set , the average total cost (ATC) is given by

Similar to [20], we study the decreasing cost attribute reduction to avoid the interpretation difficulties in region preservation based definitions. The definition of decreasing average-total-cost attribute reduct is presented as follows.

Definition 4. In a DTRS-TCI-CDS,  ;   is a decreasing average-total-cost attribute reduct if and only if(1),(2), .According to the definition, we choose the subsets of which ensure that ATC will be decreased or unchanged for decisions making in the processing of attribute reduction.

In most situations, users want to obtain the smallest total cost in the classification procedure, so we propose an optimization problem with the objective of minimizing average total classification cost. The proper attribute set to make ATC minimal is called minimal average-total-cost reduct (MACR). The optimization problem is, namely, the MACR problem. We define them as follows.

Definition 5. In a DTRS-TCI-CDS,  ;   is a MACR if and only if

Definition 6. The MACR problem:input: ;output: ;optimization objective: .If we set for all where is a constant, the MACR problem is essentially the minimal-decision-cost attribute reduct problem [20], so the former is a generalization of the latter.

4. Algorithms

Since the MACR problem is a combinational problem and it is not easy to get the optimal solution in a linear time, we use heuristic approach to obtain the approximate optimal solution. However, to evaluate the performance of a heuristic algorithm in terms of the quality of the solution, we should find an optimal reduct first, so an exhaustive algorithm is also needed. In this section, we propose a backtracking algorithm and a -weighted heuristic algorithm to address the MACR problem.

4.1. The Backtracking Attribute Reduction Algorithm

The backtracking algorithm is illustrated in Algorithm 1. In order to invoke this backtracking algorithm, several global variables should be explicitly initialized as follows:(1) is a reduct with minimal average total cost;(2) is currently minimal average total cost;(3) is current level test index lower bound.

Input: , select tests ,
  current level test index lower bound
Output: and , they are global variables
Method: backtracking
   (1)for ( ++) do
   (2)   
   (3)   if ( ) then
   (4)    continue; //Pruning for too expensive test costs
   (5)   end if
   (6)  if     then
   (7)      ; //Update the minimal average total cost
   (8)      ; //Update the minimal total cost reduct
   (9)   end if
   (10) backtracking ( );
   (11) end for

The backtracking algorithm is denoted as backtracking (). A reduct with minimal ATC will be stored in at the end of the algorithm execution. Generally, the search space of the attribute reduction algorithm is . To reduce the search space, we employ one pruning technique shown in lines through in Algorithm 1. The attribute subset will be discarded if the test cost of is not less than the current minimal average total cost (), in that the decision costs are nonnegative in real applications.

Note that total costs may decrease with the addition of attributes, which means that ATC under an attribute set may be less than that under some of its subsets. That is different from the previous works which considered only test cost [25], in which test costs increase when more attributes are selected. The following example gives an intuitive understanding.

Example 7. Take the DTRS-TCI-CDS listed in Tables 24 for example. By computation, we find that ATC is 3974.8 when the selected attribute set , while ATC is reduced to 3346.2 when .

Therefore, no matter whether the currently selected attribute subset satisfies or not, continues expanding to search a minimal ATC, which is shown in line 10 of Algorithm 1.

4.2. The -Weighted Heuristic Attribute Reduction Algorithm

The -weighted heuristic attribute reduction algorithm is listed in Algorithm 2, in which the algorithm framework contains two main steps. Let denote the set of currently selected attributes. First, we combine the current best attribute subset with according to the heuristic attribute significance function until becomes a superreduct. This step is essentially the attribute addition. Then, we delete the attribute from to guarantee with the current minimal total cost.

Input:
Output: The reduct
Method:
(1) ;
  //Addition
(2)  ;
(3) while   do
(4)  ; // controls the dimension of
(5) for  (each )  do
(6)  if      then
(7)   Compute ;
(8)  end if
(9)  end for
(10) Select with the maximal ;
(11)  if      then
(12)   ++;
(13)  Go to line 5;
(14)  else
(15)   ; ;
(16)  end if
(17) end while
    //Deletion
(18) while     do
(19)  for  (each )  do
(20)  Compute ;
(21)  end for
(22)  Select with the minimal ;
(23)   ;
(24) end while
(25) return ;

Lines 4 through 13 contain the key code of the addition step. There are two main differences from those in existing works [22, 25, 29]. One is the heuristic attribute significance function. We propose the -weighted attribute significance function as follows: where is the attribute in and is the test cost of .

The other difference is the computation steps. At first, the dimension of is 1, which means that we test current left attributes one by one. However, since the positive region may shrink with the addition of attributes in DTRS models [19], for all , may not be more than at the same time. In this case, we cannot choose a suitable attribute to make current expand to reach . To address this situation, we gradually increase the dimension of , namely, consider multiattributes simultaneously, and compute the corresponding values of attribute significance function until at least one value is more than .

5. Experiments

In this section, the performance of our two algorithms is studied. We try to answer the following questions by experimentation.(1)Are both the backtracking algorithm and the heuristic algorithm efficient?(2)Is the heuristic algorithm effective for the MACR problem?(3)Are both algorithms appropriate for classification?

5.1. Data Generation

Experiments are undertaken on four datasets obtained from the UCI library. The basic information of the datasets is listed in Table 5, where is the number of condition attributes, is the number of instances, and is the name of the decision. Note that missing values in these datasets (e.g., in voting dataset) are treated as a particular value. That is, ? is equal to itself and unequal to any other value.

Since there are no intrinsic test costs and loss functions in the datasets mentioned above, we will create these data for experimentations. First, we generate test costs that are always represented by positive integers. Let be a condition attribute; is set to a random number in subject to the uniform distribution discussed in [22]. Then, we produce loss functions , which are random nonnegative integers satisfying (3) and (5). Since the loss functions are often more than test costs in real life, we set the average of to be in . Of course, the assumptions of cost value could be easily changed if necessary. To observe whether the algorithm efficiency is influenced by the ratio of loss functions to test costs, experiments shown below are undertaken with two groups of cost settings for each dataset listed in Table 5. Each group contains 100 different cost settings. Test costs in both groups are the same, but the loss functions are different. The average values of loss functions (ALF) in group 1 and group 2 are around 500 and 3000, respectively. Experiments are undertaken on a PC with Intel 2.20 GHz CPU and 4 GB memory.

5.2. Efficiencies of the Two Algorithms

We study the efficiencies of both algorithms using two metrics. One is the number of backtrack steps Algorithm 1 is invoked. Comparing it with the size of search space [25], the efficiency of the backtracking algorithm is investigated. The other is the run-time comparison between the two algorithms. The metric is used to study the efficiency of the heuristic algorithm. The search space size and the average number of backtracking steps for Algorithm 1 are depicted in Table 6, and the average run-time for both algorithms is shown in Table 7, where the unit of run-time is 1 ms.

From the results, we note the following.(1)In both groups, the number of backtrack steps is less than the search space size, which manifests the effectiveness of the pruning technique in Algorithm 1.(2)With the increasing of ALF, both the backtrack steps and the run-time of Algorithm 1 grow, which means that the efficiency of the backtracking algorithm is influenced by the ratio of loss functions to test costs. The reason is that, when the loss functions are much more than test costs, currently minimal ATC, namely, in Algorithm 1, is also high compared to current test costs. In this case, the pruning technique shown in lines 3 to 4 of Algorithm 1 cannot make effect.(3)The run-time of Algorithm 2 is small compared with Algorithm 1, especially for the dataset Zoo. Therefore, the heuristic algorithm is very efficient. Moreover, the heuristic algorithm is stable in terms of run-time with the increasing of ALF.

In a word, the heuristic algorithm is good at the efficiency. Although the backtracking algorithm is not very efficient sometimes, it is still needed to evaluate the performance of a heuristic algorithm in terms of the quality of the solution.

5.3. Effectiveness of the Two Algorithms

In this part, we observe the effectiveness of both algorithms by using four metrics. First, two metrics defined in [22], namely, finding optimal factor (FOF) and average exceeding factor (AEF), are computed to measure the performance of the heuristic algorithm from the perspective of cost. In the computations, the results of the backtracking algorithm are used to evaluate the effectiveness of the heuristic algorithm. The results of the two metrics are shown in Figure 1.

From the results, we note the following.(1)The values of FOF and AEF are not significantly different between and . Maybe we can conclude that the performance of the heuristic algorithm is little influenced by the ratio of loss functions to test costs.(2)All FOF are above 0.5, and all AEF are below 0.1. In other words, the results are acceptable.

Then, we compare the classification performances of the original data and the reduced data obtained by our two algorithms based on 10-fold cross validation. CART and RBF-kernel SVM are used as learning algorithms, respectively. The results are depicted in Tables 8-9. We also present the comparison of the average numbers of selected attributes, which is shown in Table 10.

From the results, we observe the following.(1)The values of classification accuracy by our algorithms are a little lower than those by the raw data, but the numbers of selected attributes are effectively reduced, which is consistent with the essence of DTRS models. Different from the classical rough set, classification error is acceptable within a certain range according to the thresholds in DTRS models. Consequently, the reduction effectiveness is improved.(2)With the increasing of ALF, all numbers of selected attributes grow, and the classification performance of most datasets improves. This means that the tolerability of classification error is decreasing when the classification costs increase.(3)For all datasets, the classification performance of Algorithm 1 is a little better than that of Algorithm 2; meanwhile, the numbers of selected attributes are more in most cases.

6. Conclusions

In this paper, we address cost-sensitive attribute reduction problem in DTRS models. By considering the tradeoff of decision costs and test costs, minimal average-total-cost attribute reduct is defined, and the corresponding optimization problem is proposed. Both backtracking and heuristic algorithms are designed to deal with the optimization problem. Experimental results demonstrate the efficiency and the effectiveness of both algorithms. By combining test costs with the existing elements in DTRS models, such as the loss functions and the probabilistic approaches, our model is practical in real applications.

The following research topics deserve further investigation.(1)The MACR problem could be addressed again based on more complicated test-cost-sensitive decision systems (DS), such as the simple common-test-cost DS and the complex common-test-cost DS [21]. The corresponding algorithms may also be more complicated.(2)Sometimes the costs one could afford are limited. We could consider the attribute reduction problem with test cost constraint or total cost constraint in DTRS models.(3)Recently, from the viewpoint of rough set theory, Yao [30, 31] has discussed three-way decisions, which may have many real-world applications. One could explore the cost-sensitive attribute reduction problem for three-way decisions with decision-theoretic rough sets.

In summary, this study suggests new research trends concerning decision-theoretic rough set theory, attribute reduction problem, and cost-sensitive learning applications.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is in part supported by the National Science Foundation of China under Grant nos. 61379089, 61379049, and 61170128 and the Education Department of Fujian Province under Grant no. JA12224.