Abstract
In this work, we focus on the problem of selecting lowlevel heuristics in a hyperheuristic approach with offline learning, for the solution of instances of different problem domains. The objective is to improve the performance of the offline hyperheuristic approach, identifying equivalence classes in a set of instances of different problems and selecting the best performing heuristics in each of them. A methodology is proposed as the first step of a set of instances of all problems, and the generic characteristics of each instance and the performance of the heuristics in each one of them are considered to define the vectors of characteristics and make a grouping of classes. Metalearning with statistical tests is used to select the heuristics for each class. Finally, we used the Naive Bayes to test the set instances with kfold crossvalidation, and we compared all results statistically with the bestknown values. In this research, the methodology was tested by applying it to the problems of capacitated vehicle routing (CVRP) and graph coloring (GCP). The experimental results show that the proposed methodology can improve the performance of the offline hyperheuristic approach, correctly identifying the classes of instances and applying the appropriate heuristics in each case. This is based on the statistical comparison of the results obtained with those of the state of the art of each instance.
1. Introduction
In Computer Science, a heuristic is a technique designed to solve a problem when classical methods fail to find an exact solution or when they are too slow. Currently, there is great interest from the scientific community in offering ad hoc heuristic solutions for realworld optimization problems. To achieve this, it is necessary to have a priori knowledge of the problem and, often, computationally efficient solutions are produced in a reasonable time. However, the no free lunch theorem mentions [1] that no methodology or algorithm can solve all problems; that is, ad hoc heuristics are usually not generalizable, and they do not always work well when applied to other problems even when they share some similar characteristics. This fact has led research efforts towards the development of generalpurpose search methodologies known as hyperheuristics, whose main characteristic is that they are independent of the problem domain.
Hyperheuristics can be classified according to their learning methods, such as no learning, online learning [2], and offline learning [3]. In the context of combinatorial optimization, hyperheuristics are defined as “heuristics to choose heuristics” [4], or as “an automated methodology for selecting or generation heuristics to solve computational search problems,” [4]. According to Pillay in [2], the generality of a hyperheuristic can be seen from three levels: a generalization on instances of problems, generalization for a particular problem, and a generalization focused on different types of problems, the latter being high level. Some variations of hyperheuristics depend on the type of learning used (e.g., with online learning [2] and offline learning [3]) or the nature of the heuristics.
One of the main problems in hyperheuristics is to propose methodologies that allow generating and/or selecting the minimum set of heuristics that perform well for the problem at hand and this heuristic set is usually selected by expert researchers in the field [2]. In order to automatically select the best heuristic that performs best for the problem, an approach called metalearning was proposed [5, 6] and its use in hyperheuristics can be found in Amaya et al. [7]. Likewise, the different meanings used and taxonomy for each interaction between metalearning and optimization were studied by Song et al. [8].
As there is no methodology or algorithm that can solve all problems, our objective is to base ourselves on information about the problem and the performance of the algorithms to provide this knowledge to hyperheuristics. Metalearning generates metaknowledge, and we use this to select better heuristics for solving problems. With this approach, we pretend to propose methodologies that are like the intelligence of humans. Humans could learn from problems, their characteristics, their variables, and their restrictions, and after elaborating an analysis or discernment, the “human expert” proposes the best tool and solves the problem.
In this paper, we propose a methodology to determine a subset of heuristics for hyperheuristics through metalearning and partition for solving different problems (described below) without ad hoc adjustments by providing information about the problem and the performance of the heuristics to the hyperheuristic. It is well known that the correct characterization is the key to selecting the best heuristic [6]. Consequently, this affects the hyperheuristic design, and that is why in our approach we decided to use offline learning. Metalearning consists of two basic parts, the metafeatures and the metalearner; the first are generated from the information of the problem and the solution algorithms, while the second uses a grouping technique. Our methodology extends beyond the classic metalearner approach, and we apply nonparametric statistical tests to determine which heuristics will provide the same performance if the full set of heuristics is applied.
In order to test our proposal, we used two different wellknown problem domains: capacitated vehicle routing problem and graph coloring problem. The capacitated vehicle routing problem (CVRP) has different restrictions such as minimizing distance, time, capacity, and delivery. This problem aims to find the subtour of n cities, without repeating two cities on the same tour or different tours. In the CVRP state of the art, there exist different variants that consider this basic definition and extra restrictions. On the other hand, the graph coloring problem consists of labeling each vertex of a given graph with kcolors and it is a wellknown problem, which has been solved by exact methods, heuristics, metaheuristics, and hyperheuristics. Although each problem can be solved by ad hoc heuristics, to date, there is no general methodology capable of solving all variants of both problems. The use of partitions for constraint satisfaction problems, such as university timetabling and VRP, has had good results [9, 10]. Although there are taxonomies of the said problems, characterizing and classifying the instances under a hyperheuristic context with metalearning with a statistical test is an approach that has not been explored in the literature.
Finally, it is worth mentioning that to do a better design or choice of hyperheuristics and predict which is the best algorithm according to the classification of the previous instance, we propose to use metalearning with statistical analysis of the heuristics, which will allow improving these points. Our proposal also provides information that allows us to understand the performance of heuristics and hyperheuristics in the problem of interest.
The remaining content of this article contains a description of related work in Section 2, which covers a review on heuristics, hyperheuristics, and metalearning. Problem definitions and theories related to heuristics are reported in Section 3. Sections 4 and 5 present the proposed methodology. The found results and findings including performance comparison are described in Section 6. Finally, concluding remarks are presented in Section 7.
2. Related Work
In this section, we will give a view of the heuristic and hyperheuristic algorithms. Moreover, we will define some basic concepts of metalearning. Finally, we will give a discussion of the pros and cons of the presented methods.
2.1. Heuristics
We made an extensive review of different heuristics applied to CVRP and GCP. We selected a total of 11 heuristics that, after a previous experimental analysis, those heuristics apply to both problems and we list them below.
Kflip or Kopt heuristic was proposed by Lin and Kernighan in [11] for the travel salesman problem (TSP). This heuristic was based on the general interchange transformation, i.e., a city must change its position with another city on the same tour. Besides, this heuristic is one of the most popular for TSP [12]; it has been applied in other problems such as planar graphs, unconstrained binary quadratic programming, and the study of its complexity in SAT and MAXSAT. The twopoint perturbation is a case of kflip, and we give a detailed description and algorithm of these heuristics in the following sections.
The kswap heuristic is similar and frequently confused with kflip. The Kswap heuristic improves its performance as a perturbation move when it uses two or three movements [13].
The move to less conflict heuristic, also known as minimizing conflicts, was proposed by Minton et al. [14]. The minimizing conflict heuristic has been applied to different areas in Computer Science such as hyperheuristics, graph coloring problems, pickupanddelivery problems, and scheduling problems. The move to less conflict heuristic is a variant of the first fit, and the only difference is that the first one takes a random variable and changes its value for another that generates the least cost.
The firstfit heuristic was proved by Baker [15] for the binpacking problem. On the other hand, in recent decades, this heuristic was applied to bestknown problems such as bin packing, virtual machine relocation problem, and cutting stock. A remarkable variety of heuristics is worst fit which was studied by Baker [15] and Csirik [16], in particular, its application to binpacking problem.
SoriaAlcaraz et al. [17] proposed three heuristics for university course timetabling best single perturbation (BSP), staticdynamic perturbation (SDP), and double dynamic perturbation as part of the pool lowlevel heuristics for hyperheuristic. Moreover, these heuristics were applied to the VRP in later research [18].
2.2. Hyperheuristics
We focused on offline learning hyperheuristics selection with perturbation heuristics, whose aim is to gather knowledge in the form of rules or programs, from training set instances. Usually, the offline selection hyperheuristics belong to machine learning methods, which are trained to create a tuned methodology for a problem domain [3]. Yates and Keedwell [19] demonstrated that subsequences of heuristics were found in the offline learning database that is effective for some problem domains. They used the Elman network to compute sequences of heuristics which were evaluated on unseen HyFlex example problems, and the results obtained are capable of intradomain learning and generalization with 99% confidence.
One of the crucial issues in hyperheuristics design is the quality and size of the heuristic pool [20]. SoriaAlcaraz et al. [20] proposed a methodology using nonparametric statistics and fitness landscape measurements for hyperheuristics design. This methodology was tested on course timetabling and vehicle routing problems; their hyperheuristic proposal had a compact heuristic pool and competed with some traditional methods in course timetabling. In the course timetabling problem, they obtained five bestknown solutions of 24 PATAT instances [21]. Finally, a recent report by Amaya et al. [7] documented a model for creating selection hyperheuristics with constructive heuristics. The effectiveness of the model proposed by Amaya depends on the delta’s values used, which is useful with higher deltas.
2.3. Metalearning
The importance of metalearning, machine learning, and optimization has been studied by Song et al. [8]. The metalearning aim may concern accumulating and adapting experiences on the performance of multiple applications of a learning system. The metalearning field is also known as “learning to learn” [22] and it brings systems that can help by searching patterns across different tasks to control the process of exploiting cumulative expertise. The metalearning concept has been present in the field of heuristics and metaheuristics for TSP [23], the quadratic assignment problem, and hyperheuristics.
On the other hand, GutierrezRodríguez et al. [23] used VRP with time windows and proposed a methodology based on metalearning to select the best metaheuristic for each instance. Besides, their proposal shared and exploited an offline scheme for the instant solutions of academies and industry. Their main contributions were to propose a set of features for characterizing VRPTW instances and design a classification process that predicts the most suitable metaheuristic for each instance. Nevertheless, they assumed that the solutions of the set instances could be stored, shared, and exploited in an offline scheme for predicting good solvers for new unseen instances.
The aim of this paper is not to present a survey on heuristics or hyperheuristics; our proposal is slightly different. Our proposal considers some vital aspects of the research, including the ones from Yates and Keedwell [19]. We took the offline hyperheuristic approach from SoriaAlcaraz et al. [20] and the statistical approach to selecting a pool heuristic from Kanda et al. [5]. The offline hyperheuristic approach is an effective and popular method in the machine learning area [8]. On the other hand, the statistical approach to selecting a pool heuristic is a useful and reliable method because it takes statistical information from the input data.
3. Combinatorial Problems
Our methodology is a general approach to competitive performance across several classes of problems. Thus, we used two problem domains: graph coloring and vehicle routing problems. In the following sections, we will review the formal definition of each of these problems as well as their benchmark instances.
3.1. Graph Coloring Problem
The graph coloring problem demonstration as an NPhard problem was proposed by Karp [24]. According to [25], a formal vertexcoloring problem of a graph is a function , in which any two incident vertices are assigned different colors, that is, , and is a finite set of unordered pairs of vertices named edges, where the function is the coloring function and a graph for which there exists a vertexcoloring which requires colors is called kcolorable. The coloring function induces a partition of the graph into independent subsets where and . The benchmark instances can be found in http://mat.gsia.cmu.edu/COLOR/instances.htmlmat.gsia.cmu.edu. The above lets the partitioning methodology work on the input design and it is possible to avoid the ad hoc modifications to the heuristics since it will only pass a different objective function that adequately evaluates the instances of this problem.
3.2. Capacitated Vehicle Routing Problem
The capacitated vehicle routing problem (CVRP) is a variant of VRP [26]. In this problem, we have an undirected graph , vehicles, capacity, and a set of cities . Formally, the city is the and each vehicle must visit these cities starting from the and coming back to this. Alba and Dorronsoro [27] define a distance or travel time matrix between cities and . Each city has a demand of things . We denote it as a route , and is a permutation of the cities, starting and finishing at the depot . For each route, with . The cost of a problem solution is the sum of the costs of each route of aswhere k is the total of vehicles. This problem aims to determine for each vehicle the lowest cost (see equations (1) and (2)) tour or distance or travel time, considering the max capacity. Nota bene the hard constraints are the capacity of each vehicle and two vehicles cannot visit the same city. The CVRP has several constraints and a specific formal definition of this problem. These two characteristics let us apply our methodology with a design cities partition, where each vehicle is related to one part. As heuristics work with solutions that are already complete and respect important restrictions such as capacity, if any movement violates or exceeds this capacity, that solution is penalized in the objective function.
4. Methodology
For our methodology, it is important to know since the constraint modeling phase of the problem can be solved by partitions. The APICarpio methodology and the methodology proposed by SoriaAlcaraz et al. [28] let us transform the instances of the problem with their restrictions, into inputs to apply the proposed methodology. The MMA matrix that is generated by applying the APICarpio methodology lets us visualize the hard restrictions of the problem and evaluate the costs of visiting cities or nodes. For the soft constraints of the problem, the methodology proposed by SoriaAlcaraz et al. [28] is to be considered in the list of restrictions.
In this section, we describe the methodology to model the input data problem information used for the experimentation for two combinatorial problems based on the APICarpio methodology. We integrated the APICarpio methodology [29] and the methodology of the design proposed by SoriaAlcaraz et al. [28].
4.1. APICarpio Methodology
This methodology is used to solve the university course timetabling problem and it considers three factors: students, teachers, and institutions (infrastructure). The methodology uses several structures for the equations previously described. One of the most important structures of this work is MMA. This matrix is constructed with information on the cities or nodes. For graph coloring, we use the information of the adjacency matrix, while for CVRP it is considered the cost matrix. Table 1 shows an example of an MMA matrix. The algorithm to construct this matrix is given in [29].
4.2. Methodology of Design
This methodology was extended from the proposal by Carpio [29] and their formal definition was proposed by SoriaAlcaraz et al. [28]. The methodology of design by Soria allows us to consider the objectives of course timetabling and to satisfy the different restrictions, by converting these to lists of time and space restrictions it is seeking to minimize student conflict.
To use this methodology, two structures are used to consider restrictions and variables: MMA matrix and LPH. The LPH has information about the possible restrictions that can be assigned to each node or city. An example of this list can be found in Table 2. The list shows in each row the number part, i.e., the node can be assigned in parts 1, 2, 4, 5, 7, or 8, but not in 6. The algorithm for generating artificial instances of LPH can be found in OrtizAguilar [30].
5. Metalearning for Selecting a Subset of Heuristics for Hyperheuristics
According to Brazdil et al. [22], the approach of metalearning (ML) is to help the selection of an algorithm for a set of instances with metadata. According to Alpaydin [31], the metalearning aim is to find the best classifier for a set of data and to find the best classifier for the characterization when the data are considered. In our proposal, the data are associated with the problem instances of different problems, and the classifier is associated with the set of heuristics. Our objective is to be able to select the best set of heuristics for a hyperheuristic in a set of instances. In Figure 1, we show a diagram of the metalearning processes to obtain metaknowledge for the selection of heuristics (diagram modified from Brazdil and GiraudCarrier [32] to our methodology).
In this work, we use metalearning to select a set of heuristics for hyperheuristics in a dataset. We named the set of characterized instances of the two problem domains as the “metacharacteristics” and the model that maps each instance to the corresponding group of heuristics for hyperheuristics the “metalearner.” In this case, the metalearner selected is the Kmeans algorithm. The methodology proposed in this article consists of 5 steps in the metalearning stage:(i)Step 1: obtain the set of instances to be worked on. In this case, we have as a criterion to select those instances that are susceptible to being resolved by partitions.(ii)Step 2: evaluation and extraction of characteristics of the instances. In this step, the characteristics of the heuristics and the instances are generated. Heuristics that apply to both problems are selected, this task becomes simple with the use of the partitioning methodology, and this is because it allows working always with generic inputs where the variables and restrictions are modeled. Later, heuristics work with these generic inputs and solutions that only have a fitness function corresponding to their problem (where the objectives are evaluated).(iii)Step 3: generation of metacharacteristics. Based on the characteristics of each problem and the performance of the heuristics applied to all instances, we generate vectors of characteristics that will be our metadata.(iv)Step 4: metalearning and the recommended model of heuristics. In the state of the art, research is limited to applying only a clustering technique for the recommendation of the algorithm model. We propose to incorporate a statistical analysis together with the clustering algorithm to improve the design of the basic subset of heuristics.
5.1. Problem Definition of Metalearning for Heuristic Selection
Consider a problem that belongs to the problem set (GCP). Let be a subset of lowlevel heuristics, which are used in the state of the art, to solve the problem . We denote as a random selection of heuristics , to be applied in the solution of , where , with to a reduced set of heuristics, that is, .
Let , where represents an empty string. We defined recursively where and . Then represents the set of all strings of length , formed from the symbols in . So Kleene’s closure from is [33]
When is omitted at the junction, we get the Kleene Plus closure :
In other words, is the collection of all possible nonempty strings of finite length generated from the symbols in .
Let be a heuristic selection hyperheuristic with offline training, where the training considers the set . After training, the provides a methodology with the best order of application of lowlevel heuristics, which we will denote by in the solution of the problem , which improves the performance of the application of a , where [17].
We take two problems and that belong to the problem set (GCP) that comply with level 3 of generality proposed by Pillay [2] and are susceptible to being solved by partitions. Let and be subsets of lowlevel heuristics, which are used in the state of art, to solve the problems and , and = . We denote as a random selection of the heuristics , to be applied in the solution of and , where , with to a reduced set of heuristics, that is, .
Heuristic selection hyperheuristic is denoted as , with offline training, where the training considers the set. After training, the provides a methodology with the best order of application of lowlevel heuristics, which we will denote by in solving the problems which improve the performance of the application of a simple , where .
Our objective is to propose a methodology that provides the , with a reduced subset for its training, such that the provides a methodology , with the best order of application of the reduced set of heuristics , which we will denote by , in solving the problems and respectively, which equal the performance than with the application of the methodology where .
To solve the problem, the independent application of each of the heuristics of and was proposed, measuring their performance in solving the problems and . Apply statistical tests to contrast the performance of the independent heuristics and thereby discriminate from each set and those heuristics that obtained the lowest performance.
Next, we will focus on describing the stage of extracting characteristics from heuristics and instances. Our methodology improves the metalearning stage (step 4) with the application of nonparametric statistical tests to determine which heuristics are the ones that will provide the same performance if the full set of heuristics will be applied. This means that, if there are heuristics that are redundant, it is possible to leave them out and consider only those that enhance the speed of the search for solutions to the problem. The metalearning process proposed in this work to select the pool of heuristics includes the following steps:(1)The problems will be the source of information for the basic features.(2)Given a set of instances denoted as , for each of the instances apply a number of times the heuristic . The results will be the inner features. For the CVRP, a greedy heuristic is used, which will allow us to build feasible solutions to the problem. For graph coloring, we will initialize with a random construction heuristic. The next step is to apply the heuristics. It is possible after this step that the instances can be solved to the best solution due to their complexity. This means that it is possible to avoid executing a complete and expensive computation process when solving problems with the application of a simple heuristic.(3)With the information obtained from points 1 and 2, feature vectors will be formed that will be our metadata.(4)For a better treatment of the metadata from the previous step, the following steps are carried out:(a)The patterns that will be used in the kmeans will be scaled with the following formula [34]: where are the values of the original variables (features).(b)Generate the pattern based on the inner and basic features per instance. The basic features will be the problem information and the inner features will be the fitness obtained by the heuristics applied times to the problem.(5)Pass the feature vectors to the clustering algorithm to form classes. According to Brazdil and GiraudCarrier [32], the kmeans is a simple learning method, which we apply to carry out the grouping of instances in classes. To determine which are the number of classes to form, we use Sturges rule, since, with data that are the potency of 2, it is approximated in a good way. Determine the subclasses with Sturges rule [35] with where T is the total amount of data. As the distance metric for the Kmeans, we use the Mahalanobis distance; this distance has properties such as being invariant to scale by nonsingular linear transformations. An indepth study of different metrics [36, 37] will be a specific job to investigate whether it can improve the performance of the proposed methodology.(6)Label each pattern according to the group number in which each pattern (instance) was classified.(7)Apply again the three statistical tests to the results of heuristics per problem, according to the formulas in [38]. The test ranks 1 to the best performing heuristic, 2 to the secondbest, and n to the worstperforming heuristic. From these tests, we will take the range of the heuristics and the range will now be considered as inner features. With this information and the class label, they will now form patterns.(8)Determine a cutoff point for each class based on the range, and in this case, it will be the average of the minimum and maximum range. Choose those heuristics that pass the cutoff criterion to be part of the minimum set.(9)The output will be the minimum set of heuristics per class.
This process is shown in a specific way in Figure 2. The two important aspects of metalearning in our work are heuristics and metacharacteristics, which are lowlevel heuristics and metafeatures.
5.2. LowLevel Heuristics
An important part of the hyperheuristic approach is the selection of the heuristic set. This article proposes to extract information from heuristics and problems to generate the metafeatures [32]. This lets us improve the design and testing of the hyperheuristic algorithm. The goal in this stage is to generate metafeatures in which the heuristics can have a better performance individually for all problem instances. This improves the next part in which the hyperheuristic must choose the sequence application for each heuristic and it uses a minimal pool of heuristics which is a fundamental part of it [2]. For all instances, it will be applied times for each heuristic. The heuristics were applied to the two problems and their respective instances were as follows:(1)KFlip/Simple Random Perturbation (SRP) (). The heuristic changes the value of one or more variables (in some cases ) to another feasible value. The GCP aim is to change the color of a certain node to another [39]. Finally, for CVRP, the movement implies changing a city to another specific vehicle [12].(2)KSwap/Kempe Chain Neighborhood/SChain Neighborhood (). It must be selected two or more varieties and then interchange their values among them when possible; otherwise, the change is not made. We exchange the color between nodes previously selected by GCP. This heuristic is using in works related to TSP or CVRP, also called kinterchange [40, 41].(3)Best Single Perturbation (BSP) (). This heuristic chooses a variable according to the list of hard restrictions (LPH) and changes its value. This exchange produces a better cost or in the worst case, the same cost [17]. Next time this heuristic is going to apply, the next variable will be chosen according to the next position of the last variable which was modified. The next node which must change color will be selected according to the last variable chosen for the graph coloring problem. The CVRP must be changing the city of the vehicle to another vehicle.(4)StaticDynamic Perturbation (SDP) (). It is also known as statically dynamic perturbation (SDP). It is based on the variable selection with a probability distribution of the frequency in the last iterations. This heuristic chooses a variable and changes its value randomly [17]. The variables with fewer changes will have a higher probability to be selected. Applied to GCP, it would be a node with fewer color changes, and for the CVRP, the city has moved a few times to another vehicle.(5)Two Points Perturbation (2pp) (). It is also known as kopt, and it is a particular case of the Kswap with a value of .(6)Double Dynamic Perturbation (DDP) (). This heuristic is based on the SDP, this receives a solution, and it modifies the value of a variable concerning a probability distribution. The difference is that a copy of the initial solution is kept and, in the end, the best of the two solutions is returned [17].(7)Move to Less Conflict (MLC) (). This selects a random variable, and it assigns to a part of the value which generates the least cost [18]. In GCP, the color changes according to another which improves the fitness, and in CVRP, the city is moved to another vehicle where the total distance of the route is minimized.(8)MinConflicts (). The heuristic selects a random variable, and it assigns to a part which generates the cheapest cost [18]. In GCP, the heuristic must change color from the selected node to another, which improves the result. For CVRP, the selected city with a lower cost in which it must minimize the total distance of the route.(9)FirstFit (). It changes the value of a variable to another, which is the least repeated in other variables [18], i.e., in CVRP, the heuristic will take a city and it will change it to the vehicle that has fewer cities in its route. For the GCP, it will select a node and it will assign the color that is least repeated.(10)Worse Fit (). It assigns the most repeated value if possible, without violating the hard constraints on a randomly selected variable [42]. For GCP and CVRP, we assign a node or city to the most repeated timeslot, color, or vehicle.(11)Burke–Abdullah (BA) (). This heuristic was proposed by Abdullah et al. [43], in which it chooses a variable applying FailFirst or Brelaz Heuristic [44] and its value changes according to the one that has obtained better performance by applying the following algorithms: minimum conflict, random selection, sequential selection, and least constrained.
5.3. Metafeatures
The description and generation of characteristics permit differentiation into at least two groups of instances within the same problem class. We used the terms of basic feature and based on the proposal conducted by GutierrezRodríguez et al. [23]. As basic features, these are given by the problem, e.g., the number of nodes, colors, vehicles, and so on, depending on each problem information. For both classes of problems, the number of different is summarized in Table 3.
The fitness performance values of all heuristics are the inner feature key. Finally, the pattern per instance is basic feature + inner feature. The final pattern is shown in Table 4. For example, instance 1 has a pattern (3, 3, 8, 50, 3, 2, 1, 4), and the number of is according to the pool heuristics (eight features for the given example).
6. Methodology for Determining a Subset of Heuristics
In this section, we propose a new approach for selecting and determining a subset of heuristics to solve GCP and CVRP instances. We describe our methodology in the next steps and the graphical representation of our methodology is shown in Figure 3.(1)Variables and Problem Restrictions Identification. First, the variables and restrictions of the problem are identified according to the problem aims or objectives. To model the GCP, the values in the MMA matrix represent the weights of the edges of nodes. If there is a zero in a certain position in the matrix, this represents no connection between those nodes. For graph coloring, each node is colored considering that the adjacent nodes do not have the same color. CVRP is aligned with our methodology due to its aim seeking to get subroutes in which the tour cost (subgroup) must be the minimum or the cheapest.(2)Problem’s Restriction Modeling. In both problems, we must design a partition of nodes or cities. First, it is necessary to model the restrictions for each variable in an LPH, e.g., a node cannot be colored by a specific color or a restricted city for a tour. Then, it must design the MMA which represents the edge or connection weight between nodes or cities. In GCP, the adjacency matrix corresponds to MMA, and in CVRP, the MMA matrix will be the matrix that has the distances of the node to node. For GC, our LPH is constructed based on the number of colors in which the nodes can be labeled. In case the problem restricts colors to five, the list will be like the one shown in Table 2. Similarly, this list will be built for the CVRP, where the number of vehicles is the number of parts that should be represented on the list (see Table 2). For the problems used in this work, it was not necessary to elaborate additional structures for soft restrictions. Besides, for an extensive review and how to model additional restrictions, the research proposed by Ortiz [10] details all possible cases and different features.(3)Apply the metalearning process described in Section 5.1(4)Separate the patterns (step 6) into training and test sets to proceed to the classification phase. It is important to consider at least one pattern of each class in the test set.(5)Use the classifier on the training set to make necessary adjustments to it. After describing and getting all pattern characteristics per instance, the next step is training and testing all instances by a classifier. For our approach, we prefer to use a simple classifier as Bayesian because our objective was not to compare the performance between classification algorithms or to design ad hoc classifiers for our research. The NBC simplifies learning by assuming per class that all features are independent [45]. In our methodology, we assume that each heuristic performance is independent because we applied each heuristic to independent experiments. In the previous stage, each experiment must be run with only one heuristic, and thus, we did not apply two or more heuristics at a time. Finally, all features in the created dataset instances were normalized before applying the classifier.(6)Finally, the set of test instances will use the classifier to assign a “class” and solve it with its corresponding set of heuristics.
6.1. Designing and Testing the Hyperheuristic Offline Learning with KFolds
To choose the minimal set of heuristics and design the hyperheuristic for each class in more detail, our methodology considered the hyperheuristics with offline training as it has demonstrated good results for constraint satisfaction problems in terms of generality solution [3]. A random constructive heuristic was used to generate solutions to our problem of GCP, and for CVRP, a greedy algorithm was used. A selection hyperheuristic algorithm has three components: the pool of operators (lowlevel heuristics), a highlevel search strategy, and a control mechanism to select the operator, which will be applied at each search step.
6.1.1. HighLevel Search Strategy
The iterated local search algorithm was used as a highlevel search strategy. This metaheuristic was proposed by Lourenço et al. [46] and it is constructing a sequence of solutions generated by an embedded heuristic. The generated solutions could be better if they were only constructed randomly. The essence of this algorithm is to intensify an initial solution, exploring neighboring solutions to it. The algorithm is shown in Algorithm 1, which was taken from ElGhazali [47]. In the field of hyperheuristics with offline learning, it refers to the fact that the highlevel search strategy searches for a methodology (a sequence of heuristics) that solves a set of instances and then applies it to a given set of instances, in contrast to online learning, which refers to the construction of a given sequence of heuristics as the instances are presented.

6.1.2. Selection Operator
In the perturbation phase (step 4 in Algorithm 2), it is necessary to choose a variable following a probability distribution based on the frequency of variable selection in the last iterations. This simple heuristic allows us to modify the methodology solution. We used the same hyperheuristic framework, and according to each class, we gave a different pool of lowlevel heuristics. For example, if for class 1 the best heuristics were , , , and , these were our pool for the hyperheuristic. With a similar procedure, the design was done for each class. After this process, we trained each hyperheuristic on their respective classes for the next stage.

7. Experimental Results
This section describes our experiments in detail for graph coloring and CVRP benchmarks used in this paper. We give the configuration for the implementation of the iterated local search hyperheuristic. Finally, we described the statistical tests that we used to compare our results with the experimental methodology.
Our approach was implemented in JAVA language with JDK 1.8 using the IDE NetBeans IDE 8.2. The experiments were executed on a computer with processor Intel i77700U, 2.6 GHz, 16 GB DDR3 RAM, and operating system Windows 10 Home. The tests presented in this work were executed in a common notebook, with a single processor; it is showing the effectiveness of the exposed methodology.
For each heuristic, a limit of 100,000 function calls was given in each test run for all instances. We applied the Shapiro–Wilks test to check if the data results were normal or not, hence choosing a better representative (average or medium). If the data behavior is according to a normal distribution, the average was taken as representative and otherwise the median.
7.1. Heuristics Results for Graph Coloring and CVRP
7.1.1. Graph Coloring
We used the benchmark proposed for the second DIMACS challenge on graph coloring [48] and this is tested with 41 runs. In Tables 5 and 6, we show our results. We denote the best results with a bold face, and only the myciel2 instance was solved with the application of the individual heuristics in their optimum.
We applied a nonparametric test to verify that there are differences between the performance of each heuristic. Table 7 shows the ranges obtained in the three statistical tests for graph coloring instances. The three omnibus tests indicated there are significant differences between the heuristics. The heuristics which obtained the highest ranks in the three tests were and , i.e., these have the worst performance. and were the best results for the problems, and this is because in all omnibus tests those heuristics obtained the lowest ranks (see Table 7 marked in bold). The time of each run is reported in Table 8 in.
7.1.2. Capacitated Vehicle Routing Problem (CVRP)
Three sets of the state of the art were used and tested on 41 runs:(1)Augerat et al. (SET A), 9 instances, proposed in [49](2)Christofides, Mingozzi, and Toth (CMT), 14 instances, proposed in [50](3)Golden, Wasil, Kelly, and Chao (GWKC), 20 instances, proposed in [51](4)Uchoa et al., 9 instances, proposed in [52]
In Table 9, we show the fitness values for the instances and the lowest city cost tour is indicated in bold, where is the number of nodes, is the capacity of each vehicle, and is the number of vehicles (colors in the case of graph coloring). The time of each run is reported in Table 10.
We applied the same procedure to the statistical tests of Friedman (FT), Alienated Friedman (AFT), and Quade (Qt) to distinguish the behavior of the heuristics set. We established and as there are no differences between the performance of the heuristics and established as there are differences between the performance of the heuristics. Table 11 shows the ranks obtained in the three statistical tests.
In this case, the heuristic has the lowest rank for the tests and has the secondlowest rank for QT and FT.
7.2. Selection of Features and Classes by Statistical Tests
According to the steps mentioned in Section 5.1, we must determine first the number of clusters or classes to split all our test instances. In this case, and .
We considered 8 classes and used means clusters and we expected uniformly distributed instances in the clusters. The means algorithm was applied with a maximum number of , initially random starting points. To consider the uniform distribution of classes into clusters, we used the Manhattan distance obtained after the experimental work, with the best results.
Table 12 contains the class details, number of instances per cluster/class, number of GCP (3rd column), or CVRP (4th column) per class, min and max nodes, and min and max number of colors nodes. In this experimentation, clusters 1, 5, 6, and 7 have only GCP instances, clusters 3, 4, and 8 have CVRP instances, and only cluster 2 has both problem domains.
7.3. Training and Test Classifiers for the Instance’s Classes
After the heuristic pool design phase for the hyperheuristics, we split our dataset into training and test. The training dataset was created by 125 instances with 15 features (basic + inner) and the unseen instances were made by 15 instances. The results of the classification with Naive Bayes are reported in Table 13.
Table 14 contains the confusion matrix of the process classification. We observed that, for some classes like 3, 4, 7, and 8, the patterns were classified correctly. The rest of the classes have some patterns classified incorrectly, but, e.g., for the 3 patterns of class 1 classified into 5 and 6, we used the same pool of lowlevel heuristics and this does not represent an issue for the next step.
7.4. Designing and Testing the Hyperheuristic Offline Learning with KFolds
In the next step, the statistical tests were applied to heuristics and will form the characteristics of our instances, graph coloring, and CVRP per class. In this phase, we choose according to the rankings the heuristics which have . If, for example, the Aligned Friedman has a min rank and max rank , the limit value for considering a heuristic must be . Because heuristic 11 has the worst performance for both datasets, we did not consider this heuristic for this experimentation phase.
Table 15 shows the ranks and the heuristic fitness for each class, e.g., class 1 for QT, and AFT has the same heuristics , , , , , and , while FT does not consider . We only consider , , , , , and as a minimum set.
The selected heuristics for each class can be summarized into five groups:(1), , , , and (2), , , , and (3), , , , , and (4), , , , and (5), , , and
Therefore, we design 5 algorithms for these 8 classes. This means the hyperheuristic with the highlevel search strategy was the same, but the heuristics pool set was different according to these 5 subset heuristics. We trained and tested the hyperheuristic on shortlength pools. We left 10% of instances as unseen for the hyperheuristics and the results are shown in Table 16.
The hyperheuristic configuration was 10 iterations for local search and 100,000 function calls. For some GCP instances, we got the optimum number of colors (denoted in bold in Table 16). Besides, for the instances An45k6, An55k9, An62k8, An64k9, CMT13, GWKC1, GWKC2, GWKC3, and Xn148k46, we get values near to the optimal with a maximum of 20% of the distance.
7.5. Classification of the Test Instances and Application of the Hyperheuristic to the Corresponding Instance
Finally, for the 14 unseen instances, we used the Naive Bayes classifier, and it determines the class for these instances. Later, we applied the hyperheuristic with the corresponding pool heuristics according to the previous design and we obtained the results shown in Tables 17 and 18.
Table 17 shows the confusion matrix, TP rate, FP rate, and precision of the classification test. In these results, two patterns that belong to class five were classified incorrectly, but this does not affect the hyperheuristic solution because this class shares the same heuristics with class 6.
7.6. Statistical Comparison of Results
Finally, to compare if there are differences between the results applying the methodology and without applying the methodology, an experiment was carried out where the hyperheuristics were executed 33 times with the entire set of heuristics and 100,000 function calls. The results are shown in Table 19.
First, the statistical distributions followed by each set of results by class were analyzed, that is, the Shapiro–Wilks test [53] was applied to determine if the results of the methodology, hyperheuristic without methodology (HHPC), and the optimal state of the art followed a normal distribution. The test was applied with a ∝ = 0.05. The results of the test are shown in Table 20, and the data shown in Tables 16, 18, and 19 were taken for the tests. It should be noted that the results of the methodology only for clusters 4 and 5 were normal, the results of the optimal state of the art only for only cluster 4 were normal, and the results of HHPC for clusters 1, 4, 5, and 7 were normal.
Student’ ttests for methodology and state of the art were applied. For the state of the art, methodology, and HHPC, we established ∝ = 0.05 as a level of significance. The null and alternative hypotheses for methodology and HHPC are as follows:(i): there are no differences between the performance of hyperheuristic with the methodology and without the methodology(ii): there are differences between the performance of hyperheuristics with the methodology and without the methodology
For methodology and state of the art,(i): there are no differences between the performance of hyperheuristics and the optimal state of the art(ii): there are differences between the performance of hyperheuristics
The statistical results of the tests are shown in Table 21. With these values, we can observe the following:(i)Methodology and HHPC. It can be inferred that the results of the methodology are significantly different from those of hyperheuristics with the whole set of heuristics. This means that the methodology improved performance and allowed limiting the set of heuristics for each of the clusters.(ii)Methodology and State of the Art. It can be inferred that no statistical evidence was found that the results of the methodology differ from the optimal ones of the state of the art, except in clusters 5 and 7. This is because it is where there are more atypical data or that they were badly classified which opens an area of opportunity for the refinement of the methodology.
8. Conclusion
In this work, a methodology is proposed to select lowlevel heuristics for a hyperheuristic approach to offline learning oriented to the solution of instances of different constraint satisfaction problems. The proposal was applied to two different problems well known and studied in the state of the art, which were the coloring of graphs and the vehicle routing problem with a specific capacity, GCP and CVRP, respectively.
The methodology is focused on optimizing the number of heuristics that can be applied to different constraint satisfaction problems in a hyperheuristic approach. Information on the performance of an original set of heuristics for the instances of the problem is obtained from the different problems. The performance information is used to generate characteristic vectors for each instance, which is used to generate equivalence classes of instances of the problem. The grouping in classes allows to identify the heuristics that apply to each class and from that information, a reduction of the number of heuristics necessary to obtain good solutions in the instance of each class is made and to reduce the total number of heuristics that can be applied in the hyperheuristic approach to solving the problems involved.
In the application to the GCP and CVRP, the information on the performance of the heuristics was obtained through a metalearning process, and this information was used to obtain the basic and internal characteristics of the instances. The instances were grouped into 5 classes using the kmeans algorithm with the Mahalanobis metric. For each class, the sets of heuristics that could be applied to all their instances were identified, and through a process of hierarchization and cutoff criteria, the number of heuristics per class was reduced.
For training and testing, the Naive Bayes classifier and information on the characteristics of the instances were used. The experimental results show that the hyperheuristic in each class could efficiently solve each instance, and the classifier was able to predict the class for each problem instance.
The identification and reduction of heuristics to find the solution of complex problems is an optimization strategy that can do the search for solutions to problems of satisfaction of restrictions efficiently. The methodology presented allows generating a framework with a level of generality that can be trained to solve different problems of satisfaction of constraints simultaneously under the hyperheuristic approach. Once trained, it can allow finding good solutions to different problems with a common base of heuristics for instances of problems grouped by the efficiency of the solution heuristics.
Finally, the methodology makes it possible to improve the search for solutions to sets of problems by exploring the diversification of some of its components such as classification algorithms, metrics, heuristics, and selection criteria, which may be different for sets of different problems. A study of these possibilities is proposed as future work.
Data Availability
The instances data used to support the findings of this study have been deposited in the graph coloring repository http://vrp.atdlab.inf.pucrio.br/index.php/en/, http://archive.dimacs.rutgers.edu/pub/challenge/graph/benchmarks/color/, https://neo.lcc.uma.es/vrp/vrpinstances/capacitatedvrpinstances/. They are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
The authors thank Tecnológico Nacional de México/I. T. León and Universidad de Guanajuato. This work was supported by the National Council of Science and Technology of Mexico (CONACYT) via the Scholarship for Postgraduate Study 446106 (L. Ortiz) and Research Grant: CÁTEDRAS2598 (A. Rojas). The authors thank the participation of Valentin CalzadaLedesma from Instituto Tecnológico Superior de Purísima for the revision and correction of this article.