Abstract

Minimal test cost attribute reduction is an important problem in cost-sensitive learning. Recently, heuristic algorithms including the information gain-based algorithm and the genetic algorithm have been designed for this problem. However, in many cases these algorithms cannot find the optimal solution. In this paper, we develop an ant colony optimization algorithm to tackle this problem. The attribute set is represented as a graph with each vertex corresponding to an attribute and weight of each edge to pheromone. Our algorithm contains three stages, namely, the addition stage, the deletion stage, and the filtration stage. In the addition stage, each ant starts from the initial position and traverses edges probabilistically until the stopping criterion is satisfied. The pheromone of the traveled path is also updated in this process. In the deletion stage, each ant deletes redundant attributes. Two strategies, called the centralized deletion strategy and the distributed deletion strategy, are proposed. Finally, the ant with minimal test cost is selected to construct the reduct in the filtration stage. Experimental results on UCI datasets indicate that the algorithm is significantly better than the information gain-based one. It also outperforms the genetic algorithm on medium-sized dataset Mushroom.

1. Introduction

Cost-sensitive attribute reduction has gained much research interest in rough sets. People have considered three types of costs, namely, test cost [1], misclassification cost [2], and delay cost [3, 4]. Test cost is the money and/or time that is required to obtain attribute values. In real applications, only part of tests is needed to maintain enough information for classification. We would like to choose an attribute reduct [5] that minimizes the total test cost. This issue is called the minimal test cost reduct (MTR) problem [1].

We explain the MTR problem through a simple example. There is a medical decision system with five attributes, Mcv, Alkphos, Sgpt, Sgot, and Gammagt. The test costs of these five attributes are $55, $10, $15, $20, and $25, respectively. The decision system has three reducts, , , and . Alkphos is the core attribute. The minimal reduct is including only two attributes. The minimal test cost reduct is with only $50.

The minimal test cost attribute reduction (MTR) is proposed for saving resources, and the problem is meaningful in applications. The MTR problem is more general than the minimal reduct problem which is NP-hard; hence the MTR problem is at least NP-hard. Consequently, heuristic algorithms are needed to deal with such problem. Heuristic algorithms including information gain-based -weighted reduction algorithm [1] and genetic algorithm [6] have been designed to deal with MTR problem. Unfortunately, they often do not find the optimal solution for medium-sized datasets. Although the competition approach [1] helps to improve the performance through constructing a population of reducts, the results are still unsatisfactory. There is still room to obtain more sophisticated algorithms through other techniques.

In this paper, we develop an algorithm based on ant colony optimization for the MTR problem. The attribute set is represented as a complete graph with each vertex corresponding to one attribute. Then batches of ants are generated for attribute subset selection. Our algorithm contains three stages, namely, the addition stage, the deletion stage, and the filtration stage. First, in the addition stage, core vertexes are compressed into one vertex as the initial position of all ants. From the initial position, each ant selects next vertex according to test costs of each adjacent vertex and pheromone of each adjacent edge. If the attribute vertexes an ant has traveled satisfy the positive region condition, the ant stops, otherwise continues to add attributes. Second, in the deletion stage, redundant attributes are deleted from the obtained attribute subsets, and the pheromone of each edge is updated. Two strategies including centralized deletion and distributed deletion are designed for this stage. Third, the ant with the least test cost is selected, and the attribute subset corresponding to its path is output as the result in the filtration stage.

To evaluate the performance of algorithms, we adopt three measures, namely, finding optimal factor (FOF), maximal exceeding factor (MEF), and average exceeding factor (AEF) [1]. Experimental results indicate that our algorithm outperforms the information gain-based algorithm in most datasets with different test cost distributions. It can obtain better results than the genetic algorithm except some small datasets. One possible reason is that the ant colony optimization algorithm produces more diverse solutions than the existing ones. The distributed deletion strategy is superior to the centralized one on medium-sized dataset Mushroom.

The rest of the paper is organized as follows. Section 2 reviews the basic concepts in rough sets and decision system. Section 3 proposes the ant colony optimization to tackle the minimal test cost reduction problem. In Section 4, we present our experiment schemes and show the results. We also give a simple analysis of our experimental results. Finally, Section 5 presents the conclusion.

2. Preliminaries

This section reviews basic knowledge: test-cost-independent decision systems, relative reduct, minimal test cost reduct problem, genetic algorithm, and ant colony optimization.

2.1. Test-Cost-Independent Decision Systems

Most supervised learning approaches are based on decision systems. A decision system is often denoted by , where is a finite set of objects called the universe, is the set of conditional attributes, also called the set of tests, is the set of decision attributes, also called the decision, is the set of values for each , and is an information function for each . We often denote and by and , respectively. A decision system is often represented by a decision table, as shown in Table 1.

We consider the simplest case though most widely used type of cost-sensitive decision systems as follows.

Definition 1 (see [7]). A test-cost-independent decision system (TCI-DS) is the 6-tuple where , and have the same meanings as in a decision system and is the test cost function. Test costs are independent of one another; that is, for any .

We usually use a vector to represent the cost function. An exemplary cost vector is shown in Table 2. In fact, cost-sensitive decision systems are more general than decision systems. If all elements in are 0, a TCI-DS coincides with a DS. For simplicity, free tests are not considered in this work. This consideration is reasonable since we always need some cost to obtain data.

2.2. The Relative Reduct

The relative reduct is a crucial concept in rough sets, and there are many different definitions, such as positive region reducts [5], maximum distribution reducts [8], fuzzy reducts [9], and -reduct [10]. These definitions are equivalent if the decision table is consistent. Furthermore, there are some extended concepts such as dynamic reducts [11], parallel reducts [12, 13], and M-relative reducts [14]. In some extensions of rough sets, that is, covering-based rough sets [15], there are other definitions of a reduct (see, e.g., [1618]).

Most existing reduct problems aim at finding the minimal description of the data. And the objectives include finding attribute subsets with the minimal size [5, 19], the minimal space [20], or a covering with the minimal number of subsets [16]. Since the test cost issue is the focus of this paper, we are interested in reducts with the minimal test cost. This type of reducts is defined as follows.

Definition 2 (see [1]). Let be a TCI-DS and the set of all reducts of . Any , where , is called a minimal test cost reduct.

The set of all minimal test cost reducts is denoted by . And the problem of constructing is called the minimal test cost reduct (MTR) problem. As indicated in [1], the time complexity of computing is the same as .

2.3. The Minimal Test Cost Reduct Problem

Attribute reduction is a key issue of the rough sets research. The classical [5], covering-based [16, 21], decision-theoretical [3], variable-precision [10], dominance-based [22], and neighborhood [23] rough sets models address the reduction problem from different perspectives. A number of definitions of relative reducts exist [5, 19, 23, 24] for different rough sets models. This paper employs the definition based on the positive region.

2.4. The Genetic Algorithm

In the computer science field of artificial intelligence, a genetic algorithm (GA) is a search heuristic that mimics the process of natural evolution [25]. This heuristic (also sometimes called a metaheuristic) is routinely used to generate useful solutions to optimization and search problems. Genetic algorithms belong to the larger class of evolutionary algorithms (EA), which generate solutions to optimization problems using techniques inspired by natural evolution, such as inheritance, mutation, selection, and crossover [25]. In [26], the genetic algorithm is employed to evolve the cost-sensitive decision trees. Recently, the genetic algorithm has been employed to tackle the minimal test cost reduction problem [6] and attribute reduction with test cost constraint [27].

2.5. The Ant Colony Optimization

Swarm intelligence is a relatively new approach to problem solving that takes inspiration from the social behaviors of insects and other animals [28]. The ant colony optimization (ACO) algorithm is a probabilistic technique for solving computational problems, which can be reduced to finding good paths through graphs. ACO algorithms are state-of-the-art for the sequential ordering problem [29], the vehicle routing problem with time window constraints [30], the quadratic assignment problem [31], the arc-weighted l-cardinality tree problem [32], and the shortest common supersequence problem [33]. In rough sets, the classical attribute reduction problem has been tackled by the ACO [34, 35].

2.6. Evaluation Measures

For evaluating the experiment results, we adopt evaluation measures proposed in [1]. The new algorithm is compared with the information gain-based heuristic algorithm on four UCI datasets [36].

We need a measure to evaluate the quality of one particular reduction. Since an algorithm can run on many datasets or one dataset with different test cost settings, we adopt three metrics from a statistical viewpoint. They are finding optimal factor (FOF), maximal exceeding factor (MEF), and average exceeding factor (AEF) [1].

2.6.1. Finding Optimal Factor

Let the number of experiments be and the number of successful searches of an optimal reduct . The finding optimal factor is defined as

2.6.2. Exceeding Factor

For a dataset with a particular test cost setting, let be an optimal reduct. The exceeding factor of a reduct is

The exceeding factor provides a quantitative metric to evaluate the performance of a reduct. It indicates the badness of a reduct when it is not optimal. Naturally, if is an optimal reduct, the exceeding factor is 0. To demonstrate the performance of an algorithm, statistical metrics are needed. Let the number of experiments be . In the th experiment (), the reduct computed by the algorithm is denoted by . The maximal exceeding factor is defined as

This shows the worst case of the algorithm given some dataset. Although it relates to the performance of one particular reduct, it should be viewed as a statistical rather than an individual metric. The average exceeding factor is defined as Since it is averaged on different test-cost-sensitive decision systems, it shows the overall performance of the algorithm solely from a statistical perspective.

3. The Algorithm

In this section, one hand, we revise the classical ant colony optimization. On the other hand, we present our ant colony optimization with different techniques to tackle the minimal test cost reduct problem.

3.1. The Problem Representation and Algorithm Framework

In order to apply ant colony optimization to tackle the minimal test cost reduction problem, we adopt the following model.Graph. The decision system is represented as a graph.Vertex. An attribute is represented as a vertex. Each feature vertex has information about test cost.Edge. There is an edge between any two vertexes. Each edge has information on pheromone density.Adjacent Matrix. The values of the matrix represent pheromone density of each edge.

Because our objective is attribute reduction, not classification which produces rule sets, we do not adopt the tree structure such as the ant colony decision tree. In this section, we employ the first, simplified ant colony optimization—AS. We represent the general algorithm framework as follows.

Stage 1 (addition stage). Compute the core of the dataset. batches of ants are created with each batch containing ants, therefore giving a total of ants. Each ant takes the core as the starting position. From the initial positions, each ant traverses edges probabilistically until the stopping criterion is satisfied.

Stage 2 (deletion stage). Delete redundant attributes and update the pheromone of the traveled path.

Stage 3 (filtration stage). Gather the attribute subsets obtained by ants, and compute their test cost. Choose the attribute subset with minimal test cost, and output it as the result.

Throughout the paper, the stopping criterion is the positive region condition. The selecting probability depends on the test cost of each adjacent attribute vertex and the pheromone of each adjacent edge. The probabilistic transition rule is where is the number of the ant, as the heuristic information, means the pheromone density of the edge , is the exponent of , is the exponent of , and is the set of all unvisited adjacent vertexes of the attribute .

The difference between the centralized deletion strategy and the distributed strategy is the time to delete attributes. The deletion of redundant attributes follows the finish of the journey of all ants in the case of the centralized deletion strategy. When using the distributed deletion strategy, the algorithm deletes redundant attributes after each ant travels a path, and then it adjusts the path. In this situation, the pheromone of edges adjoining to redundant attribute vertexes will not be updated. The adjusting method is illustrated in Figure 2. Monte Carlo methods are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results that is by running simulations many times over in order to calculate those same probabilities heuristically just like actually playing and recording your results in a real casino situation: hence the name. We employ Monte Carlo method to simulate the process that an artificial ant selects the next attribute vertex probabilistically.

3.2. The ACO with Distributed Deletion

We represent the substantial algorithm of the ant colony optimization with distributed deletion strategy as Algorithm 1. The algorithm with centralized deletion strategy is similar.

Input:
Output: , a reduct of
Method: acomtr
(1) ;//the best reduct obtained by ants
(2) for ( ; ; ++) do
(3)  for ( ; ; ) do
(4)    create ant[ ];
(5)    ;//takes the set of core attributes
(6)  end for
  //Each ant selects attributes
(7)  for (each where ant[ ] has not stopped) do
(8)   MonteCarlo( );//select the next vertex
(9)   if ( ) then
(10)    ant stops; //Delete redundant vertexes.
(11)     ;//minimal test cost
(12)    for ( ; ; - -) do
(13)     if ( ) then
(14)       ;
(15)     end if
(16)    end for
(17)    Adjust the traveled path
(18)    Update the pheromone of edges ant[ ] traveled
(19)   end if
(20)  end for
(21) end for
 //Choose the minimal test cost reduction from the last ants
(22) for ( ; ; ++) do
(23)    if     then
(24)    ;
(25)    ;
(26)  end if
(27) end for
(28) return ;

We explain the algorithm in details as follows. Lines 1 through 8 correspond to Stage 1 in the algorithm framework. In lines 2 through 4, batches of ants are created, with each batch containing ants, therefore giving a total of ants. Each ant takes the core, as indicated by line 5. Lines 9 through 21 represent Stage 2. Select vertexes until all ants in the batch meet the positive region condition. Whenever an ant stops, it removes the redundant attributes and adjusts the traveled path. In line 18, the ants release pheromone to the adjusted paths. After all ants finish their journey, select the best reduct as lines 22 through 27 have shown. Obviously, these lines relate to Stage 3. If we adopt the centralized deletion strategy, the algorithm deletes redundant attributes after all ants finish their journey.

3.3. A Running Example

The TCI-DS is given by Tables 1 and 2. We illustrate the process of an ant in Figure 1, where attributes Headache, Muscle, Temperature, and Snivel are represented by 0, 1, 2, and 3, respectively.

After an ant travels a path, the redundant attributes are removed. After deletion, the algorithm reconstructs the path. This adjusting process is represented in Figure 2.

Each ant is placed on an attribute randomly. For simplicity, assume ; namely, there is one artificial ant in one batch; Consider ; hence any ant initially takes no vertex.

Stage 1 (addition stage)

Stage 1.1 (artificial ants start). Assume ant starts from the attribute.

Stage 1.2 (artificial ants select attributes)

Iteration 1. Ant is at the attribute Headache. Ant has three attributes to select, and they are Muscle pain, Temperature, and Snivel.

According to (6), the denominator of selecting probability is .

We compute the probability of three selections:

Because , the ant will select this as next feature with high probability. That is to say, the ant does not select the attribute necessarily. In this case, we assume the ant choose the feature Snivel. At the same time, the ant will add the selected attribute to the visited attribute set. We find the selected attribute set the ant has visited does not satisfy the positive region condition, so it will continue to select next attribute.

Iteration 2. Ant is at the attribute Snivel currently.

The candidate features are Muscle pain and Temperature. According to the probability function, the denominator of selecting probability is .

Consider

More over ; then the artificial ant almost selects the attribute Muscle pain.

Now, the attribute set obtained by ant has satisfied the positive region constraint. The ant stops working and updates pheromone information.

Stage 1.3 (pheromone updating).  After an artificial ant stops working, it will update the pheromone density of edges traveled by itself. In our algorithm, when one path is crossed by an ant, the pheromone diffusion will increase by one. Of course, the rule of pheromone updating may be designed in other methods. We adopt the way in this paper.

In this example, the sequence attribute selection of ant is Headache, Snivel, and Muscle pain. The pheromone of edge and edge increases by one.

Each artificial ant runs in above three stages.

Stage 2 (deletion stage). After all ants have stopped working, the algorithm will delete redundant attributes from the selected attribute sets of last few ants using positive region. If an attribute does not contribute to the positive region, we remove it.

Stage 3 (filtration stage). After deletion, we choose the reduct obtained from last ants with minimal test cost as the result.

In this example, we adopt the centralized deletion strategy. Figure 2 illustrates the distributed deletion strategy in an iteration. In the figure, the ant travels the paths 0, 1, 2, and 3. Suppose the attribute 1 is a redundant attribute, we remove it from the attribute subset and adjust the path. The adjusting process is shown in Figure 2(c).

4. Experiments

In this section, we try to answer the following questions by experimentation.(1)How does the number of ants in the ant colony influence the result?(2)How does the strategy of deleting redundant attributes influence the quality of the result?(3)Does our ant colony optimization algorithm outperform the existing one?

The UCI datasets we used are Zoo, Voting, Tic-tac-toe, and Mushroom. Since these datasets have no test cost settings, for statistical purposes, we apply three common distributions to generate random test cost. The three distributions are uniform distribution, normal distribution, and Pareto distribution. In this paper, the test cost is a random integer ranging from 1 to 100. The exponent in (6) is set to be 2, since under many circumstances this value is a good setup [28].

We do not design the parameter learning mechanism. The employed competition approach has the selection mechanism of the parameter. Different applications use the same range, and users do not specify the value of the parameter. The strategy is more straightforward than many other parameter tuning strategies. However, we may design other strategies to save time. Finding optimal factor, minimal exceeding factor, and average exceeding factor are employed to measure the effectiveness of the algorithms. We run each algorithm on datasets with 1000 times. The results have statistical characteristics.

4.1. The Influence of Ant Counts on Experiment Results

In order to find the relationship between the number of ants and the quality of the result, we conduct this experiment. We run our algorithm with 100 ants, 150 ants, 200 ants … 400 ants. In this section, the number of experiments is set to be 100. The exponent in (6) is set to be 2. Results are shown in Tables 3 and 4.

4.2. Comparison with Existing Heuristic Algorithms

According to the result of the above experiments, we find 100 is the optimal setting of the number of the ants. We conduct an empirical study to examine the effect of our algorithm. To improve the performance, we use the competition approach [1] to enhance our algorithm. The exponent in (6) is set as integers ranging from 1 to 4. We illustrate the results among the information gain-based algorithm, GA-1, GA-2, ACO with centralized deletion and the ACO with distributed deletion, in Table 5. For clarity, the results on dataset Mushroom are shown in Figure 5.

Our algorithm is tested on the UCI datasets 1000 times, respectively. The new algorithm with different techniques is compared with the information gain-based algorithm [1] and the genetic algorithm [6]. Experimental results are listed in Figures 3, 4, and 5 and Tables 3, 4, and 5.

4.3. Experimental Results

Now we can answer the questions proposed at the beginning of this section.(1)Experiment results indicate that the effect of the algorithm becomes worse with the increment of the ant counts. We find that 100 is the optimal setting of the number of ants. So, we run our algorithm with 100 ants to compare with the existing heuristic algorithms.(2)From Figures 3 and 4, we infer that the distributed deletion outperforms the centralized one on the medium-sized dataset Mushroom without the competition approach. When we use the competition approach, the centralized deletion strategy and the distributed strategy produce the similar quality of results shown in Table 5 and Figure 5.(3)The ant colony optimization provides better performance than the information gain one and the genetic one. Our algorithm overcomes the shortcoming of the genetic algorithm which performs badly on the larger dataset. On medium-sized dataset Mushroom, the ACO outperforms the genetic one significantly. For example, when the test cost distribution is normal, the FOF of the ant colony optimization using centralized deletion and distributed deletion is , , respectively, outperforming the information gain one, GA-1, and GA-2 with 17.6%, 32.1%, and 52.1%, respectively. Figure 5 gives a more intuitive understanding of results on the dataset Mushroom. One possible reason is that the ant colony optimization algorithm produces more diverse solutions than the existing one.

5. Conclusions

In this paper, we have pointed out the shortcoming of the existing heuristic algorithms including the information gain-based one and the genetic one. We have designed a new algorithm based on ant colony optimization to tackle the MTR problem. In deletion stage, our algorithm contains centralized deletion strategy and distributed deletion strategy. We have tested the new one with three representative test cost distributions on four UCI datasets. Experimental results show that the new algorithm outperforms the existing ones significantly, especially 1`on medium-sized dataset such as Mushroom. Moreover, when distribution is normal, the distributed deletion strategy obtains better results than the centralized one on Mushroom dataset.

Acknowledgments

This work is in part supported by the National Science Foundation of China under Grant no. 61170128, the Natural Science Foundation of Fujian Province, China, under Grant no. 2012J01294, State Key Laboratory of Management and Control for Complex Systems Open Project under Grant no. 20110106, and Fujian Province Foundation of Higher Education under Grant no. JK2012028.