Abstract

Ant colony optimization (ACO) algorithms have been successfully applied to identify classification rules in data mining. This paper proposes a new ant colony optimization algorithm, named , for the hierarchical multilabel classification problem in protein function prediction. The proposed algorithm is characterized by an orderly roulette selection strategy that distinguishes the merits of the data attributes through attributes importance ranking in classification model construction. A new pheromone update strategy is introduced to prevent the algorithm from getting trapped in local optima and thus leading to more efficient identification of classification rules. The comparison studies to other closely related algorithms on 16 publicly available datasets reveal the efficiency of the proposed algorithm.

1. Introduction

In the last few decades, various techniques have been successfully proposed to solve classification problems in the fields of machine learning and data mining [13]. However, most of the existing classification techniques are designed to handle data with binary or nominal class labels (where class labels are independent). They cannot handle problems with multiple class labels organized in hierarchical structure (CHS) [4]. Such problems are commonly known as hierarchical classification with regard to the one-level flat classification problems.

Due to the complex structure of hierarchical multilabel classification problems, they are more difficult to solve than the flat single label classification problems. The samples may be assigned to several classes that form a hierarchical structure, for example, a tree or directed acyclic graph [5], at the same time. Some difficulties are inherent for those problems. Firstly, because of the hierarchical structure of classes, less nodes are available at the bottom of the tree than that at the top. As such, it is more difficult to classify the nodes if the tree is deep. Secondly, samples classified in the lower levels of the hierarchy must satisfy the parent-child relationships; that is, they should also fall within the parent classes. Finally, a sample can also be classified to multiple classes that have no parent-child relationship.

Recently, many bioinspired heuristic algorithms have been designed to solve optimization problems and successfully applied in data classification problems [68]. Among them, ant colony optimization (ACO) algorithms have shown promising performance in mining classification rules in the form of “IF AND AND THEN .” The rules identified by ACO algorithms not only perform well in terms of the predictive accuracy, but also can be easily expressed in natural language and thus lead to good comprehensibility [9]. Nevertheless, the exponential increasing of data volume and types in the fields of machine learning and data mining has posed great challenges for ACO algorithms to deal with hierarchical multilabel classification problems especially in terms of computation efficiency and robustness.

In this paper, a novel ACO-based algorithm named is presented to identify classification rules for the hierarchical multilabel classification problem in protein function prediction. is equipped with an orderly roulette selection strategy and a new pheromone update strategy to enhance the capability of handling large-scale problems and the robustness. Particularly, in the orderly roulette selection, the data attributes are sorted such that the algorithm can distinguish the pros and cons of each attribute and construct good classification model more efficiently. The new pheromone update strategy is designed to guide the ants to find better global optimal solutions, which strengthens the degree of pheromone update to avoid falling into local optimum. To evaluate the performance of the proposed algorithm, eighteen publicly available datasets are employed. Two closely related decision-tree-based algorithms (CLUS-HSC and CLUS-SC) [10] and two ACO-based algorithms (hmAntMiner [11] and hmAntMiner-C [12]) are involved in the comparison study. shows superiority in terms of prediction accuracy and comprehensibility of classification model.

The remainder of this paper is organized as follows. Section 2 reviews ACO algorithms for classification rules discovery and the existing algorithms in hierarchical multilabel classification for protein function prediction. The proposed algorithm is also described in Section 2 where the details of the orderly roulette selection strategy and the new pheromone update strategy are provided. Section 3 presents the experimental results of the comparison studies on publicly available datasets. Finally, Section 4 concludes this study and some future directions are discussed.

2. Materials and Methods

2.1. ACO for Classification
2.1.1. ACO for Flat Single Label Classification Problems

ACO has been widely used in the flat single label classification problems. The AntMiner algorithm proposed by Parpinelli et al. [13] represents one of the most well-known ACO-based classifiers. In AntMiner, a heuristic search method is introduced to identify the rule information in the dataset and a sequential covering strategy to discover a rule. Based on the discovered rules, the classified training samples are removed and the training set is reduced. The remaining samples are used for further rule discovery and the process iterates until no enough training samples are available. Particularly, the process of AntMiner is outlined in Algorithm 1.

Input: training samples
Output: adiscovered list of rules
 (1) examples all training examples;
 (2) Rule_set ;
 (3) while  > maximum uncovered do
 (4) Initialize Pheromones(), Heuristic Information(examples), ;
 (5) ;
 (6) while t < maximum iterations and no stagnation do
 (7)   for   to ants_size do
 (8)    Create Rule(examples);
 (9)   Prune();
 (10)   Evaluate ;
 (11)   ;
 (12)   end for
 (13)  Update Pheromones();
 (14)  Evaluate ;
 (15)  ;
 (16)  ;
 (17) end while
 (18) Traing set Taining set Covered(, examples);
 (19) Rule_set Rule_set + ;
 (20) end while
 (21) return Rule_set;

First, the rule set is initialized to be empty, and each ant starts to build a rule by adding one term at a time. The pheromone and heuristic values of the new term decide whether it should be added to the rule set. To calculate the heuristic values of one term, the entropy and normalized information gain [14] are used. The rule is applied to the dataset by a majority vote mechanism and the irrelevant rule terms are pruned to raise the accuracy [15]. Then the next rule is constructed by other artificial ants. After all the ants have built their rules, the best rule in that iteration is identified and the pheromone is updated based on that basis [16]. If the best rule is better than the global best rule, the global best rule will be set to be equal to the local best rule. Otherwise, the iteration best rule will be discarded. The global best rule up till now is used to remove the examples correctly covered and the next global best rule is found using the remaining training examples. This process of constructing a global best rule is repeated until the maximum number of iterations is reached, or the current constructed best rule is the same as the best one constructed by a specified number of previous iterations. The outer global rule set growing iteration stops if the number of remaining examples is less than a threshold. The output of this algorithm is an ordered set of rules, which is used to classify the test dataset.

Many variants of AntMiner have been proposed to improve the performance of classification. For example, AntMiner2 [17] and AntMiner3 [18] use a simple heuristic function, which adopts a density estimation in rules discovery and is calculated only once for each term, to replace the relatively complex heuristic in AntMiner. Moreover, to encourage exploration, AntMiner3 presents a new pheromone update method, in which the pheromone is updated and evaporated only for those predefined conditions occurring in the rules. In an enhanced version of AntMiner (AntMiner+), the class label is selected before the ants build their rules and a class-specific heuristic function is imposed to enable the ants to know the class of an extracting rule [19]. The relative importance of the pheromone and heuristic values is adjusted by two important ACO parameters and . A new AntMiner variant, namely, AntMiner-CC, considers the relationship between the term selected previously and the next candidate term by utilizing a new heuristic function [20] based on the correlation of dataset attributes, given the preselected class label and its potential to maximize the correct coverage.

[21] proposed by Liang et al. adopts a new heuristic information function considering both the correlation and coverage for the purpose to avoid deceptive high accuracy. cAnt-Miner [22] and its improved version [23] introduce continuous attributes handling strategy and new rule sequential covering strategy, respectively, to enhance the performance of rule identification. Smaldon and Freitas [24] improved AntMiner to produce an unordered set of classification rules. ACORI [25] uses an optimization method to find the near optimal order of rules in the decision list. cAnt-Miner [26] embeds several extensions into the original AntMiner algorithms. Multiple pheromone level types are considered when the rule’s consequent class is selected prior to the antecedent of the rule construction.

AntMiner can also be improved by mixing with other heuristic optimization algorithms or classifiers. For example, by combining the strengths of AntMiner and particle swarm optimization [27], a resultant hybrid algorithm provides a very promising performance thanks to their specific capabilities in handling continuous attribute and nominal attribute-value construction. Ant-Tree-Miner [28] induces decision tress rather than a set of rules, which is consequently quite different from AntMiner and its variants. The advantage of the decision trees is that the model it represents is easy to understand in a graphical form and the ACO algorithm outputs a set of classification rules. Boryczka and Kozak proposed an ACDT algorithm [29] which can make agents-ants interact during the construction decision trees via pheromone values to generate solutions efficiently. In a real world application, Feng et al. combined SVM method with the clustering based on self-organized ant colony network to take the advantages of both while avoiding their weaknesses and then used this algorithm to classify network activities as normal or abnormal [30].

2.1.2. ACO for Hierarchical Multilabel Classification

In hierarchical classification, the class labels are naturally organized as a class hierarchy/taxonomy, which typically are represented as a tree or directed acyclic graph (DAG), as shown in Figures 1(a) and 1(b). In the hierarchy, the nodes represent the class labels and edges represent the relationship between the class labels. Different class hierarchy structures impose different restrictions on the graph; for example, in DAG a node can have more than one parent. To predict a class label in the hierarchy, the classifier should also predict all the ancestor class labels.

It is clear that the edges between the parent and the children node represent IS-A relationship in the hierarchy. The nodes at the top levels of that hierarchy are easier to predict because they represent more general class labels, whereas the nodes at the bottom levels are more difficult to predict, because more information is needed to distinguish them. For these reasons, the classifier should look for a tradeoff between generality and specificity in the hierarchical classification. An example is given in Figure 2 for the class classification of human. If we predict an item as “Human,” it is 100% correct. However, predicting the lower level specific class is more important in this setting and it is more liable to make the wrong prediction.

Hierarchical multilabel classification problems can be handled by constructing a baseline classifier for each class label, also known as local approach, or considering all the hierarchically related classes on a whole, that is, global approach. For example, Koller and Sahami [31] proposed a local classifier approach algorithm that works by training a decision tree for each class label individually. For a given item, a baseline decision tree is used to predict the presence/absence of the corresponding class labels. Chen et al. [32] extended the decision tree classifier to predict the hierarchical class labels, where the best attributes are selected using an extended entropy measure. Vens et al. [10] investigate two local approaches based on decision tree, namely, CLUS-HSC and CLUS-SC, and a global approach, CLUS-HMC, to classify the labels in the hierarchy simultaneously. Particularly, CLUS-HMC is based on the theory of predictive clustering tree framework [33], and each node in the tree is conceived as a cluster. Generally, local approaches tend to be more computationally demanding as a classifier must be trained many times. Moreover, the misclassifications at higher levels are propagated and affect the classification of lower level labels [34]. Global approaches can overcome the aforementioned problems by considering all the hierarchically related classes at once. However, global approaches are more difficult to model than local approaches.

ACO-based approaches have also been increasingly used to deal with hierarchical multilabel classification problem in a global manner. Chan and Freitas [35] proposed a new ACO algorithm, named MuLAM (Multilabel AntMiner), to discover a multilabel classification rule which can predict one or more class labels at a time. Otero et al. [5] proposed hAntMiner (Hierarchical Classification AntMiner) for hierarchical classification problem, which is an extension of the flat classification AntMiner algorithm. In hAntMiner, a hierarchical rule evaluation measure, heuristic information, and an extended rule representation are used for classification. hAntMiner is further extended to hmAntMiner [11] to handle hierarchical multilabel classification problem of protein function prediction. A new heuristic function based on the Euclidean distance is introduced in hmAntMiner to discover an ordered list of hierarchical multilabel classification rules. The experimental results presented in [11] demonstrate the superiority of hmAntMiner to other local/global methods including CLUS-HSC, CLUS-SC, and CLUS-HMC. Khan and Baig [12] proposed an hmAntMiner variant, namely, hmAntMiner-C by introducing search space simplification mechanisms, more accurate correlation based heuristic function, and new representation of pheromone matrix and evaporation process. In this work, we also improve hmAntMiner by introducing an orderly roulette selection strategy and a new pheromone update strategy. The resultant algorithm, namely, is described in the following section.

2.2. The Proposed Algorithm

Following the general structure of ACO algorithm, some modifications are made in to construct each list of rules. Firstly, we design a new roulette selection strategy to distinguish the merits of the data attributes through attributes importance ranking. So each ant can find a better rule. Secondly, we use a new pheromone update strategy to strengthen the degree of pheromone update and give the ant a better guide. That update strategy uses the global best rule instead of the local optimal rule. Finally, the new algorithm utilizes a large number of uncharacterized proteins in the analysis and does better to determine their functions in the biological process. The result of algorithm is an ordered list of hierarchical multilabel classification rules to predict protein functions. The pseudocode of algorithm is described in Algorithm 2. The rule building process continues until all ants have built their own rules. Then the Klösgen measure [36] is used to evaluate the constructed rules so the precision can be corrected for the class distribution. reduces the computational cost by only considering the relevant term of the iteration best rule. The global best rule is updated after multiple assessments to help the ants find better rules in the next iteration.

Input: protein training examples
Output: classification model decision_list
 (1) decision_list ;
 (2) examples protein training examples;
 (3) while   > max_uncovered_examples and not converged  do
 (4)  Calculate New Heuristic Information(examples);
 (5)  Initialize Pheromone();
 (6)  ;
 (7)    ;
 (8)  while not (_number_iterations OR Rule_Convergence)  do
 (9)   ;
 (10)   for   to colony_size do
 (11)     // use the new roulette selection strategy
 (12)      Orderly_roulette_selection_strategy_CreateRule(examples);
 (13)     Prune();
 (14)     if Quality() > Quality() then
 (15)      ;
 (16)     end if
 (17)    end for
 (18)    // use the new pheromone update strategy
 (19)     Intensive_Update_ Pheromone();
 (20)    if Quality() > Quality() then
 (21)     ;
 (22)    end if
 (23)  end while
 (24)  decision_list decision_list + ;
 (25)  examples examples Covered(, examples);
 (26) end while
 (27) return decision_list;

The details of are provided as follows. As shown in Algorithm 2, in line (), it starts with an empty decision list. Then in lines ()–(26), in the outer while loop, the algorithm iteratively adds one rule at a time to the decision list until the termination criteria are satisfied. In lines ()–(23), an inner while loop is executed, and in each iteration a rule is constructed by an ACO procedure. All ants choose data attributes to be added to their current partial rule by the orderly roulette selection strategy. In line (13), the duplicate data attributes are pruned in the rule. The quality of the rules in the current iteration is evaluated and a best rule in this iteration is selected as the iteration best rule, as described in lines (14)–(16). The pheromone trails are updated in line (19) using the global best rule based on the intensive pheromone update strategy to guide ants to search for better rules. In lines (20)–(22), if the iteration best rule is better than the global best rule, the iteration best rule will be selected as the global best rule. Then the global best rule constructed covered is added to the decision list of rules and the covered training examples (training examples that satisfy the antecedent of the global best rule) are removed from the training set in lines (24)-(25). The procedure of creating a rule is repeated until the accuracy on validation begins to reduce and the rest of protein training samples are examples less than the predefined max uncovered examples. This prevents the classification model from the noise in the training data and the separate validation set can be monitored during the training phase.

2.2.1. Hierarchical Multilabel Rule Consequent

The outputs of many previous algorithms are usually a single path in the consequent construction graph, that is, a trail from the root class label down to the leaf class label in the class hierarchy. But for protein prediction it will not work in that form. To apply those outputs to protein prediction problems, the examples covered by the rule (examples that satisfy the rule antecedent) can be used as the information to determine the rule consequent. The consequent of a rule in algorithm is computed by a deterministic procedure as follows:where represents the set of examples covered by rule , which generates a vector of length ( is the number of class labels) as a result of that rule. is the th component of the class vector. is the number of examples belonging to the th class of the class hierarchy that is covered by rule . The class , a vector of length , represents the proportion of examples which are covered by rule in a particular th class.

Based on the previous definition, each element of a vector is a continuous value ranging from 0.0 to 1.0, rather than single value 1 or 0, that is, true or false value of a particular class label. The value is a probability of the examples covered by a rule satisfying the antecedent to belong to the corresponding th class of the hierarchy. Figure 3 shows an example of a result of a rule discovered by algorithm. The predictor attributes in the IF statement are amino acid ratios from the protein sequence and the THEN part are Gene Ontology terms representing the class labels. Following each GO term is the probability of the sequence belonging to that GO term class label.

2.2.2. Hierarchical Multilabel Rule Construction

(1) Heuristic Information Function. In , the heuristic function incorporates a distance-based information using the class hierarchy. The variance of the one set of examples covered by the term is incorporated in the heuristic information. A numeric vector of length represents the class label of each example. If the th component of the class label vector is 0 it means the example does not belong to that class, and the same logic applies to the 1’s case. We use a weighted Euclidean distance to represent the distance between the class label vectors as follows:where is the weight of the th class label and and represent the th class value of two examples, respectively. We use the average square distance between each of example class labels and the set’s mean class vector to represent the variance of a set of examples as follows:where is the set of examples covered by a term and is the set’s mean class label vector. Finally, the heuristic information of a term is given bywhere is the sum of the best and the worst variance values of all terms. The definition ensures assigning values greater than zero to the worst terms, which otherwise would prevent them from being selected by an ant.

Moreover, also uses a class-specific weighting scheme, where the weight is defined as follows:where is set to 0.75, is the parent class label set of the class label , and is the weight associated with the th parent class label of the class label . According to (5), classes in the higher part of the class hierarchy have bigger weights than the class label in the lower part in the hierarchy.

(2) A New Roulette Selection Strategy. In the search process of ACO, artificial ants constantly choose nodes through the guidance of pheromone and heuristic information and eventually search for a best solution. Each node corresponding to the heuristic information value is calculated based on (4). The pheromone values associated with an edge between two nodes accumulate constantly in the iteration process of ant colony optimization. The probability of selection of node is given by the following formula:where is the concentration of pheromone between and for the th ant, is the value of the heuristic information in , is the amount of pheromone concentration between and , where is a value increasing from 1 to the total number of next attribute values, and is its current value of the heuristic function. All the selected nodes belong to those attributes that have not become prohibited.

Based on (6), each ant uses the roulette selection strategy to select effective nodes. Roulette selection is also known as selection operator; that is, the probability of an individual being selected is proportional to its fitness function value, as shown in the following formula:where is the number of candidate nodes; the fitness value of each candidate node is . The larger the value of , the greater the probability of th node being selected.

For example, one of the candidate nodes, in Figure 4(a), represents the area of one piece of the pie chart. The area of the block is proportional to the fitness value of the candidate nodes. As the number of protein data attributes increases like Figure 4(b), the roulette selection strategy tends to be trapped in random selection. This paper proposes an orderly roulette selection strategy; all nodes are in an orderly sequence according to the probability of each node. In this way, artificial ants can differentiate the merits of the node. Orderly roulette selection strategy removes the poor candidate nodes; that is, only the better candidate nodes are selected, such that all artificial ants can select excellent candidate nodes more efficiently and generate better rules.

(3) Hierarchical Multilabel Rule Evaluation. Using a distance-based measure, the variance gain can be applied to compute a rule quality measure. The basic idea to evaluate a rule using the variance gain measure is to virtually divide the training set into two partitions: the set of examples covered by the rule r () and the set of examples not covered by the rule r (). Then the variance gain of rule relative to is computed as follows:

The variance can naturally cope with hierarchical multilabel data, taking into account the relationships and similarities between class labels. And it favors rules that partition the training set into more homogeneous sets of examples. At last, rules that cover a more homogeneous set of examples, as well as leaving a more homogeneous set of examples uncovered, are preferred.

(4) A New Pheromone Update Strategy. The pheromone values are associated with an edge between two vertices in the graph. Because the number of protein data attributes is large, artificial ants are difficult to converge to optimal solution. In the pheromone matrix, the decrease of the pheromone concentration is accomplished by pheromone evaporation. Over time, the amount of pheromone on all the edges reduces by an evaporation factor , while the global best rule based on its quality reinforces its pheromone concentration. The quality of a rule is shown as follows:where TP and FP, respectively, refer to the numbers of correct and incorrect examples covered by the rule that have the same class label. is the total number of examples whose class labels are the selected class. is the total number of examples belonging to other classes. Equation (9) is used to evaluate all the rules.

The pheromone update formula is given as follows:where is quality of the global best rule, is the concentration of pheromone released by the th ant in the th iteration. In our new pheromone update function, the update amplitude of pheromone concentration of the global best rule increases more than in the original version. The pheromone on the better rules accumulates faster and more, at the same time, which strengthens the convergence of the algorithm.

3. Experimental Results

In this section, the experimental setting is first introduced and then the performance of the proposed algorithm is evaluated using 16 publicly available datasets [10], which include two different class hierarchy structures: the tree structure, that is, the FunCat dataset, and the DAG structure, the Gene Ontology (GO) dataset. The DAG structure represents a more complex hierarchical organization, where a particular node of the hierarchy can have more than one parent. In contrast, in tree structures, each node has only one parent. The average numbers of class labels of FunCat and GO datasets are 489 and 3932, respectively. The average numbers of labels per example in FunCat and GO datasets are 8.5 and 34.2, respectively. The detailed information of the two datasets is provided in Table 1.

In the experiments, 2/3 of each dataset is used for training and the remaining 1/3 is used for testing. The proposed algorithm is compared with two closely related decision-tree-based algorithms (CLUS-HSC and CLUS-SC) [10] and two ACO-based algorithms (hmAntMiner [11] and hmAntMiner-C [12]). CLUS-SC is a local approach that induces a decision tree for each class label individually to deal with hierarchical multilabel classification problems. CLUS-HSC is also a local approach to construct decision trees in a top-down fashion to predict the functions of protein data. hmAntMiner is a global approach than can discover an ordered list of hierarchical multilabel classification rules based on ant colony optimization. hmAntMiner-C is an improved version of hmAntMiner. We use the same training and test partitions for all algorithms in the experiments to guaranty a fair comparison.

3.1. Performance Metric and Parameters Setting

To evaluate the proposed algorithms, the main consideration is the classification accuracy, which is the percentage of correctly classified test samples. The comprehensibility of the classifiers [37, 38] is accessed by the number of discovered rules and the number of terms per rule, which are used as indirect performance metrics.

Generally, more iterations and ants can help get a better result. However, simply increasing those two parameters may cause a great raise in execution time but a small gain in accuracy. To overcome this problem, we use F-Race [38] racing procedure to identify optimal parameter settings. For the two parameters mentioned above, three different values for each are tested. The number of ants is selected from while the number of iterations is chosen in . These nine combinations of parameters are commonly used in ACO-based algorithms [39]. Our experiments show that when the maximal number of iterations is set to 10, the algorithm obtains best tradeoff between convergence and time consumption. Besides, the number of ants is set to 10, which ensures that more ants are employed to find a better solution. It is validated in lots of experiments that the remaining parameters set to 10, respectively, can obtain higher accuracy while maintaining a reasonable execution time [11]. All the parameter settings for our proposed algorithm are shown in Table 2, while the parameter settings of CLUS-SC and CLUS-HSC are set as recommended in their papers [10].

3.2. Precision-Recall Curves to Evaluate Classification Model

In information retrieval [40] and hierarchical multilabel classification [10], PR (precision-recall) curves are frequently used for its suitability to deal with highly skewed datasets (much more negative examples than positive ones). PR curve plots a precision value against recall value. The precision value is the number of correct predictions divided by the total number of predictions. The recall value is the number of correct predictions divided by the total number of positive examples, that is, examples belonging to the predicted class label. Those two values only take the positive values into account, so the number of negative predictions does not influence the evaluation. As mentioned above, the lower level classes are more difficult to have a true positive result. PR curves ignore the true negative examples so this explains how well a rule predicts the presence of a particular class label.

3.3. Comparisons of with Various Classification Algorithms

In this subsection, the performance of our algorithm () is compared with two classical classification algorithms (CLUS-HSC and CLUS-SC) and two ACO-based classification algorithms (hmAntMiner and hmAntMiner-C). Our algorithm is implemented in Java. The software myra-3.7 [41] is adopted, while a Java Library for Multilabel Learning [42] is used to run CLUS-HSC and CLUS-SC. The results in Table 3 show the average accuracy achieved by the cross-validation procedure followed by the standard error of all algorithms in the corresponding datasets. The experimental results concerning the size of the construction classification model are summarized in Table 4, where the smallest model size on each dataset is marked with boldface. The results of CLUS-HSC and CLUS-SC are measured with the average numbers of leaf nodes in the generated decision tree. The size of classification model of the remaining algorithms is obtained by recording the average number of rules.

Vargha-Delaney -test [43] is used to measure the statistical significance of the experimental result. It is a nonparametric effect magnitude test to differentiate between two samples of observations. Its return value is a probability value between 0 and 1, indicating the probability that a randomly selected observation from is bigger or smaller than a randomly selected sample from , which also represents the degree to which the two samples are overlapped. A value in interval [0, 0.29] or [0.71, 1.0] indicates a significant difference between and . In other cases, no significant difference is observed. In Tables 5 and 6, the symbols “+” and “−” are used to denote that is significantly better or worse than the corresponding compared algorithm, respectively. Symbol “” suggests the results of the two compared algorithms are similar.

From the results shown in Table 3, obtains the best overall predictive accuracy in FunCat dataset and the second best accuracy in GO dataset. hmAntMiner-C wins in GO dataset, whereas it is not comparable to in FunCat dataset. To represent the search space topology, hmAntMiner-C uses the layering of attribute-value pair, as a grid and DAG topology, which leads to much simpler search space than its competitors. It is not surprising that hmAntMiner-C outperforms other algorithms in GA datasets that are organized in DAG structure. On each dataset, the “Rank” value indicates the performance ranking of the corresponding algorithm among all algorithms. The last row “A. rank” denotes the average rank on all datasets. The A. rank of is 1.81, which is equal to that of hmAntMiner-C. hmAntMiner, CLUS-HSC, and CLUS-SC achieve A. rank values of 2.75, 3.56, and 4.88, respectively. It is observed from the last row in Table 3 that and hmAntMiner-C perform the best in terms of predictive accuracy.

Besides predictive accuracy, we also compare the average classification model sizes of different algorithms. For rules discovery classification algorithms, the number of rules reflects the size of the rule list, because each rule is correlated with a class label. In decision tree algorithms, the leaf nodes are noted by class labels, which reflect the classification model size. In Table 4, the size of is the smallest in all the datasets. Also in the last row of Table 4, the achieves the lowest average ranks, which means a better average performance than other algorithms.

Statistical test of the performance difference between hmAntMiner-C, hmAntMiner, CLUS-HSC, CLUS-SC, and is shown in Tables 5 and 6. The results present the summary of the comparisons of the algorithm (our algorithm with the best average rank) with the remaining algorithms used in our experiments according to the Vargha-Delaney -test in terms of predictive accuracy and classification model size. For each algorithm, the test results obtained by Vargha-Delaney -test are reported on both FunCat and Gene Ontology datasets. The last row “Better/Similar/Worse” indicates the number of datasets on which the proposed algorithm is significantly better than, similar to, and significantly worse than the other algorithms, respectively. In Table 5, shows comparable performance to hmAntMiner-C and statistically better performance than the other compared algorithms in terms of prediction accuracy. Regarding the classification model size, as shown in Table 6, is observed to obtain significantly smaller model size than the other algorithms in most of the test datasets. Overall, the obtains the best compromise of prediction accuracy and classification model size considering both tree and DAG hierarchical structures.

4. Conclusion and Future Work

In this paper, we propose , a novel ACO-based classification algorithm with a high predictive accuracy and low model size. Some new features are introduced to the proposed algorithm. Firstly, a new roulette selection strategy is designed to distinguish the merits of the data attributes through attributes importance ranking. In this way, each ant can search for a better rule efficiently. Secondly, a new pheromone update strategy is presented to strengthen the degree of pheromone update and complete a better guide to the ants. can cope with the large increase in the number of uncharacterized proteins available for analysis and the importance of determining their functions in order to improve the current biological knowledge. These new features are implemented in our algorithm and 16 publicly available datasets are used to evaluate the classification performance of . When compared with the other four closely related classification algorithms, including hmAntMiner-C, hmAntMiner, CLUS-HSC, and CLUS-SC, performs superiorly or competitively in terms of predictive accuracy and obtains preferable comprehensibility. In the future work, other components like local search [4447] and differential operators [48] can be introduced to to improve the efficiency of the algorithm.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (Grants nos. 61471246 and 61672358), Innovation Foundation for Higher Education of Guangdong, China (Grant no. 2016KTSCX121), Guangdong Foundation of Outstanding Young Teachers in Higher Education Institutions (Grant no. Yq2013141), Guangdong Special Support Program of Top-Notch Young Professionals (Grant no. 2014TQ01X273), and Shenzhen Scientific Research and Development Funding Program (Grant no. ZYC201105170243A).