Abstract

A good classifier can correctly predict new data for which the class label is unknown, so it is important to construct a high accuracy classifier. Hence, classification techniques are much useful in ubiquitous computing. Associative classification achieves higher classification accuracy than some traditional rule-based classification approaches. However, the approach also has two major deficiencies. First, it generates a very large number of association classification rules, especially when the minimum support is set to be low. It is difficult to select a high quality rule set for classification. Second, the accuracy of associative classification depends on the setting of the minimum support and the minimum confidence. In comparison with associative classification, some improved traditional rule-based classification approaches often produce a classification rule set that plays an important role in prediction. Thus, some improved traditional rule-based classification approaches not only achieve better efficiency than associative classification but also get higher accuracy. In this paper, we put forward a new classification approach called CMR (classification based on multiple classification rules). CMR combines the advantages of both associative classification and rule-based classification. Our experimental results show that CMR gets higher accuracy than some traditional rule-based classification methods.

1. Introduction

Classification is a pervasive data mining problem which has many applications, such as medical analysis, fraud detection, and network security [1]. A good classifier can correctly classify an object for which the class label is unknown. Therefore, building accurate and efficient classifier is one of the essential tasks of data mining. As a consequence, classification techniques are quite useful in ubiquitous computing [25]. The classification problem has been extensively studied by the research community. Various types of classification approaches have been proposed (e.g., KNN [6], Bayesian classifiers [7], decision trees [8], support vector machines [9], neural networks [10], and associative classifiers [11]). Classification is generally divided into two steps. First, we construct classification model based on the training dataset. Second, we use the model to predict new instances for which the class labels are unknown.

In recent years, associative classification has been investigated widely [1119]. It integrates association rule mining algorithm and classification. Associative classification induces a set of association classification rules from the training dataset which satisfies certain user-specified frequency and confidence. Then it selects a small set of high quality association classification rules and uses this rule set for prediction. The experimental results indicate in [11, 12] that associative classification gets higher accuracy than some traditional classification approaches such as decision trees [8]. In comparison with some traditional rule-based classification approaches, associative classification has two characteristics: (1) it generates a large number of association classification rules and (2) the measure support and confidence are used for evaluating the significance of association classification rules. However, associative classification has some weaknesses. First, it often generates a very large number of association classification rules in association rule mining, especially when the training dataset is large and dense. It takes great efforts to select a set of high quality classification rules from among them. Second, the accuracy of associative classification depends on the setting of the minimum support and the minimum confidence. Third, the efficiency of associative classification is low when the minimum support is set to be low and the training dataset is large.

Traditional rule-based classification approaches also have been studied extensively [2025]. Some traditional rule-based algorithms like FOIL [20], CN2 [21], and ELEM2 [22] discover a small set of high quality classification rules. They employ a sequential covering methodology and induce one rule at a time, and then they remove the positive instances that are covered by each new discovered rule. This rule induction process is done in a greedy fashion as it uses a heuristic function to select an attribute value to determine how each rule would be extended. They can achieve higher efficiency than associative classification. However, the accuracy of some traditional rule-based classification approaches may not be as high in some datasets. One of the reasons is that they usually generate a much small set of classification rules, especially when the training dataset is small. Some novel rule-based classification approaches have been proposed recently [2225]. They can generate more classification rules than the one-rule-at-a-time algorithms and achieve higher accuracy. CPAR [23] keeps all close-to-the-best attribute values during the rule building process. By doing so, CPAR can select more attribute values and build several rules at one time. Thus, CPAR discovers more classification rules than FOIL. CPAR achieves higher average classification accuracy than associative classification algorithm CBA [11]. CATW algorithm [24] is different from PRM [23]. After an example is covered by a rule, CATW decreases both the tuple weight of the example and the weight of attribute values which the rule contains. As a result, CATW generates a much larger set of classification rules than FOIL. CATW achieves higher accuracy than CPAR and CMAR [12] in many cases. CMER [25] creates a seed set and a candidate set. It connects the seed set with the candidate set to generate several classification rules at a time. CMER generates a much larger set of classification rules than FOIL. The experimental results show that CMER achieves higher accuracy than FOIL.

In this paper, we propose a new classification approach called CMR (classification based on multiple classification rules). CMR combines the advantages of both associative classification and rule-based classification. It is distinguished from other rule induction methods in six aspects: (1) CMR uses multiple measures the minimum support and foil gain to select important attribute values. (2) CMR uses the minimum support and the minimum confidence to generate all classification rules with length one like associative classification. (3) CMR constructs a seed set and a candidate set. Both the seed set and the candidate set are selected from important attribute values. (4) CMR connects the seed set with the candidate set and then generates classification rules with length two and candidate patterns. (5) CMR adopts the minimum support and foil gain to build a new seed set using these candidate patterns. (6) CMR extends this new seed set and induces rules by adding the best attribute value one by one. The experimental results show that CMR gets higher accuracy than some traditional rule-based algorithms.

The outline of this paper is as follows. In Section 2, we comment on some related work. In Section 3, first we give some definitions. Second, we use an example to describe the main ideas of inducing classification rules of CMR. Third, we develop the algorithm of CMR. Finally, we discuss how to predict class labels for unseen examples using the classification rule set generated by CMR. We report our experimental results in Section 4. We conclude the paper in Section 5.

There are many classification approaches in classification domain. They are extremely different on inducing the classification rules from the training dataset and testing an object for which the class label is unknown. Our work presented in this paper is related to some existing researches of various classification methods. Therefore, we comment on some of these including the extraction of classification rules and testing strategies in the following.

(1) Generating the Set of Classification Rules from the Training Dataset. Decision tree method selects the best attribute with the highest information gain and then builds the decision tree as a classification rule set. It generates a small rule set, especially when the training dataset is small. Yin and Han [23] introduce three ways of generating the set of classification rules from the training dataset. They are FOIL, PRM, and CPAR. FOIL uses foil gain to select the best attribute value from the whole training dataset. And then it selects the best attribute value in the conditional database of the new selected attribute value. It adds it one by one to produce a classification rule. FOIL repeatedly searches for the current best rule and removes all the positive examples which are covered by the rule until all the positive examples in the dataset are removed. FOIL induces a small rule set as decision tree. PRM proposes an algorithm which modifies FOIL. PRM uses foil gain to select the best attribute value and adds attribute values one by one. However, after an example is covered by a rule, instead of removing it, its weight is decreased by multiplying a factor. PRM generates more rules than FOIL and each positive example is usually covered more than once. CPAR builds rules by adding attribute values one by one, which is similar to PRM. CPAR selects the best attribute value and keeps all close-to-the-best attribute values. CPAR connects the best attribute with the attribute values which are close to the best one. By doing so, CPAR selects more attribute values and builds several rules at one time. CATW [24] not only decreases the weight of an example by multiplying a factor, but also decreases the attribute values weight which the rules matche after a rule is generated. ELEM2 takes into account the support of an attribute value and selects the most relevant attribute value for formulating rules. CMER [25] uses the support and foil gain to select several important attribute values to build the candidate set and the seed set. It connects the seed set with the candidate set and generates several classification rules at a time. CAEP [26] proposes a new measure growth rate for finding pattern. It produces emerge patterns with growth rate greater than or equal to the minimum threshold and then induces classification rule set with these patterns. In associative classification [11], association rule mining is used to generate candidate rules, which includes all conjunctions of attribute values that meet the minimum support threshold. Then the measure confidence is used to generate the association classification rule set.

In this paper, we propose a new classification approach called CMR. First, CMR uses the support and confidence to generate classification rules with length one. Second, CMR connects the seed set with the candidate set and generates rules with length two. Third, CMR builds new seed set and then induces rules by connecting the new seed set with the attribute values which are the best in the conditional databases. Finally, CMR removes the examples which are covered by all the rules generated and iterates this process. CMR induces many rules at a time.

(2) Classifying New Objects. When predicting the class label, CBA uses the best rule whose body matches the example. CMAR uses multiple association classification rules and weighted Chi-square to measure the strength of group rule set under both conditional support and class distribution. CPAR selects the best rules for each class and compares the average expected accuracy of each class. CAEP sums the contributions of the individual EPs. ELEM2 gives a decision score for each class that the matched rules indicate. CMR selects the best rules for each class and compares the average decision score.

3. CMR: Classification Based on Multiple Classification Rules

In this part, we first give some definitions. Second, we use an example to describe the process of rule mining of CMR. Third we develop the algorithm of CMR. Finally, we give the measure of the significance of the classification rules and introduce how to predict new examples.

3.1. Definitions

Let be a training dataset with distinguished attributes . Let be the class set of . We introduce some definitions as follows.

Definition 1 (foil gain). Suppose that is an attribute value. There are positive examples and negative examples in . The foil gain of is defined as follows: where is the number of positive examples which contain attribute value and is the number of negative examples which contain attribute value .

Definition 2 (support). The support of pattern is defined as follows: where is the number of examples in which contain pattern and is the number of examples in dataset .

Definition 3 (confidence). The confidence of pattern is defined as follows: where is the number of examples which contain pattern and have a class .

3.2. Inducing Rules

We first generate the rules with length one. Then we construct a candidate set and a seed set. Third, we connect the seed set with the candidate set and generate the rules with length 2. Fourth, we build a new seed set and generate the classification rules based on the new seed set. Finally, we remove the examples which are covered by the just-found rules and iterate the process. The following example shows the detailed process of inducing rules of CMR.

Example 4. The training dataset is shown in Table 1. We suppose that the attribute is the decision attribute and others are the condition attributes. In the training dataset , we suppose that all examples which have the class are positive examples and all examples which have the class are negative examples. Let the minimum support be and let the minimum confidence be . We induce classification rules for the class .
First, CMR generates the rules with length one. CMR adopts the minimum support and the minimum confidence as the measurement. We calculate the support and the confidence of all the attribute values in the positive examples. If the support of an attribute value is greater than the minimum support, and if the confidence of is greater than the minimum confidence, then is selected as the rules with length one. We have rules with length one as shown in Table 2.
Second, CMR constructs the candidate set and the seed set. If the foil gain of an attribute value in the positive examples is greater than the zero, then is selected as an element of the candidate set. We use the average foil gain of all elements in the candidate set as the threshold of the minimum foil gain. If the foil gain of an attribute value in the candidate set is greater than the minimum foil gain, then is selected as an element of the seed set. We have the candidate set as shown in Table 3. In this example, the seed set consists of one element. It is the best attribute value in the candidate set. Attribute value is the element of the seed set with the foil gain and support .
Third, CMR connects the seed set with the candidate set to produce patterns. If the confidence of a pattern is greater than the minimum confidence, then is a classification rule which has length two. If the support of a pattern is greater than the minimum support, and the foil gain of is greater than the minimum foil gain, then is selected as an element of the new seed set. The new seed set consists of only one element. It is the pattern .
Fourth, we generate rules by extending pattern . CMR selects an attribute value with the best foil gain from the conditional database of pattern . In this example, attribute value has the best foil gain. CMR forms the new pattern . CMR computes confidence of pattern . If the confidence of is greater than the minimum confidence, then a classification rule is induced. Otherwise it continues finding the best attribute value from the conditional of pattern . After the rule which contains pattern is produced, CMR removes it from the new seed set and generates rules for other patterns in the new seed set until there are no patterns in the new seed set.
Finally, CMR removes the examples that are covered by rules produced and iterates the process.

3.3. The Algorithm of Inducing Rules of CMR

Algorithm 1 is the algorithm of inducing rules of CMR.

Input: Training data set ( and are the sets of all positive and negative examples,
respectively), the minimum support, the minimum confidence.
Output: A set of classification rule
    (1) the rule set , the candidate set , the seed set , the frequent
 pattern , the conditional positive example , the conditional negative
 example
   (2) while ( ) do
   (3) compute the support and the confidence of each attribute value in
   (4)   if (sup( ) > minsup && conf( ) > minconf)
   (5)     
   (6)   else if sup( ) > minsup
   (7)     
   (8)   end if
   (9)  compute the foil gain of each attribute value in
 (10)        if ( )
     
      end if
  (11)      compute the average foil gain of all attribute values in
       if ( > average foil gain)
        
       end if
 (12) connect the seed set with the candidate set to generate patterns
 (13) compute the support and confidence of each pattern
 (14)   if ( > minconf)
 (15)    
 (16)   else if ( > minsup and f the average foil gain)
    
 (17)   else delete
 (18)  end if
 (19) For each pattern in
(20)   if (the example in positive example set contains )
 (21)  
(22)   end if
(23)   if (the example in negative example set contains )
(24)        
(25)   end if
(26)   while ( > 0)
(27)        find the best attribute value according to and
(28)          append to
(29)        remove all examples from that do not satisfying
(30)   end while
 (31) end
(32) remove all examples from positive example set that satisfy the rule set
(33) end while

3.4. Predication Using Classification Rule Set

When we predict an example, we use all rules that satisfy the example. Three cases are possible for matching an example with a set of rules. There may be only one match (i.e., the example matches only one rule), more than one match (i.e., the example matches more than one rule), or no match (i.e., the example does not match any rules). If the matched rules do not agree on the class labels, we give a decision score for each class. Thus, we need to evaluate every rule to determine its predication power. The significance of a rule is defined as follows.

For rule , we use the expected accuracy to estimate the significance of rule . The expected accuracy of rule is given by (4) (denoted as SIG): where is the number of classes, is the total number of examples which contain pattern , and is the total number of examples which contain pattern and have the class label .

For a testing example, we select the best rules which are matched by the example. If all the best rules have the same class label, then the testing example is classified into this class. If the matched rules are not in the same class, CMR computes a decision score for each class. If the class label has the number rules which match the example, the decision score of the class label is the average SIG of rules. CMR classifies the example into the class with the highest decision score.

4. Experimental Results

All experiments are performed on mushroom characteristic dataset. The number of the testing dataset is set to be in all experiments. We select them from 0–5500 in turn in mushroom dataset. We select the best rules for prediction.

In Table 4, we choose the size of the training dataset from to in turn. We select the training dataset randomly in mushroom dataset. The minimum support is varied from to . From Table 4, we can see that CMR has different accuracy in different support. The average accuracy of CMR is the highest when the minimum support is . However, the accuracy of CMR is not varied obviously by the minimum support.

Table 5 and Figure 1 show the accuracy of FOIL, CMR, and CMER, respectively. In Table 5 and Figure 1, the minimum support is set to be . The minimum confidence is set to be . From Table 5 and Figure 1, we can conclude that (1) the accuracy of CMER is higher than Foil, no matter how large the training date set is, (2) when the training dataset is small, the accuracy of CMR is much higher than Foil, (3) CMR achieves higher accuracy than CMER in many cases, and (4) CMR achieves higher average classification accuracy than CMER.

5. Conclusions

Accuracy and efficiency are crucial factors in classification tasks in data mining. Associative classification gets higher accuracy than some traditional rule-based classification approaches in some cases. However, it generates a large number of association classification rules. Therefore, the efficiency of associative classification is not high when the minimum support is set to be low and the training dataset is large. In comparison with associative classification, one of the reasons that traditional rule-based classification methods cannot achieve high accuracy is that they often generate a few classification rules. In this paper, a new classification approach called CMR is proposed. CMR combines the advantages of both associative classification and rule-based classification. It induces many rules at a time. As a result, CMR generates much more classification rules than many other traditional rule-based classification methods, especially when the training dataset is small. Our experimental results show that the techniques developed in this paper are feasible. Our experimental results also show that CMR achieves high accuracy.

Conflict of Interests

The author declares that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is funded by the China NFS Program (no. 61170129) and by the Fujian province NSF Program (no. 2013J01259).