Computational Intelligence and Neuroscience

Volume 2016, Article ID 7349070, 15 pages

http://dx.doi.org/10.1155/2016/7349070

## Experimental Matching of Instances to Heuristics for Constraint Satisfaction Problems

National School of Engineering and Sciences, Tecnológico de Monterrey, Avenida Eugenio Garza Sada 2501 Sur, Colonia Tecnológico, 64849 Monterrey, NL, Mexico

Received 29 September 2015; Revised 16 December 2015; Accepted 27 December 2015

Academic Editor: Paul C. Kainen

Copyright © 2016 Jorge Humberto Moreno-Scott et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Constraint satisfaction problems are of special interest for the artificial intelligence and operations research community due to their many applications. Although heuristics involved in solving these problems have largely been studied in the past, little is known about the relation between instances and the respective performance of the heuristics used to solve them. This paper focuses on both the exploration of the instance space to identify relations between instances and good performing heuristics and how to use such relations to improve the search. Firstly, the document describes a methodology to explore the instance space of constraint satisfaction problems and evaluate the corresponding performance of six variable ordering heuristics for such instances in order to find regions on the instance space where some heuristics outperform the others. Analyzing such regions favors the understanding of how these heuristics work and contribute to their improvement. Secondly, we use the information gathered from the first stage to predict the most suitable heuristic to use according to the features of the instance currently being solved. This approach proved to be competitive when compared against the heuristics applied in isolation on both randomly generated and structured instances of constraint satisfaction problems.

#### 1. Introduction

Combinatorial problems are recurrent in artificial intelligence and related areas. The current literature contains a significant amount of work that has focused on designing and implementing methods that successfully solve these problems by combining the strengths of existing algorithms to improve the performance. Examples of these methods include dynamic algorithm portfolios [1–3], selection hyperheuristics [4–6], and instance specific algorithm configuration (ISAC) [7]. In general, all these methods manage a set of algorithms (solvers, heuristics, or strategies) and apply one that is suitable to the current problem state of the instance being solved. Although different names have been used in the literature, from this point on, we will refer to these methods as algorithm selectors.

Algorithm selectors relate instances to one suitable strategy to be used during the search, based on its historical performance on similar instances. These methods have proven to be reliable for solving a much wider set of instances than the algorithms they select from. These strategies keep a record of the historical performance of different algorithms on a set of solved instances in order to estimate, based on the similarity of the instances, the expected performance of such algorithms on unseen instances. To estimate how similar two instances are, the algorithm selectors compare the values of a set of features that characterize the instances. With this, algorithm selection strategies define and maintain a relation of instances to algorithms that is used to determine a suitable algorithm to be used when a new instance is presented to the system. Unfortunately, the internal representations of the relation between instances and algorithms are usually hardly interpretable by humans, making it difficult to understand how the algorithm selectors make their decisions.

In this investigation, we focus on analyzing the relation between instances and heuristics for constraint satisfaction problem (CSP), which is one of the most studied combinatorial problems in the literature. A CSP consists of a set of variables that contains the variables that need to be assigned a value from a corresponding domain , and there exists a set of constraints that restricts the values a subset of variables can simultaneously take. The importance of CSPs lies in the fact that many combinatorial problems such as scheduling [8], radio link frequency assignment [9], and microcontroller selection/pin assignment [10] can be formulated as CSPs.

CSPs are usually solved by using backtracking-based algorithms [11]. Backtracking-based algorithms explore the space of solutions by using depth-first search, where every node in the search tree represents an assignment. The process starts with an empty variable assignment that is iteratively extended until obtaining a complete assignment that satisfies all the constraints or the instance is proven to be unsatisfiable [12]. These algorithms rely on a constructive approach that takes one variable at the time and consider only one value for it. Heuristics are usually applied to decide the next variable to instantiate and which value to use. These heuristics are commonly referred to as variable and value ordering heuristics, respectively. Once a variable is assigned a value, the search evaluates whether the assignment breaks one or more constraints. If that is the case, another value must be tried for such variable. If during the search any variable runs out of values, the algorithm backtracks and assigns a new value to the variable located at the backtracking position. Thus, the algorithm tries to undo the path once a failure has been detected by going back to upper levels until it gets to a variable where it can change its value and continue the search from that point. In general, the better the decisions of the heuristics, the smaller the cost of the search. But heuristics are problem dependent and then their performance may significantly vary from one instance to another.

This study analyzes the behaviour of six variable ordering heuristics to identify the most suitable ones for specific regions of the CSP instance space and also which ones should not be used on certain areas of the space. The hypothesis is that we can use information from the individual performance of various variable ordering heuristics on a set of instances to produce easy-to-interpret rules to predict the performance of such heuristics on unseen instances. The overall goal of this investigation is to use the information collected about heuristics and their performance on various instances to produce an algorithm selector that recommends the most suitable heuristic to use according to the features of the instance at hand in order to minimize the cost of the search.

This paper provides insights into how to answer two important questions for the community: (1) given a heuristic, on which instances is it likely to perform well? (2) Given an instance, which heuristics are likely to perform well? These questions are carefully addressed by an extensive experimental setup that includes the analysis of six variable ordering heuristics on more than two hundred thousand instances. Finally, derived from this analysis, we proposed a simple but useful algorithm selector that exploits the strengths of six different heuristics to improve the search. In summary, the main contributions of this paper are as follows:(i)The analysis of the performance of six variable ordering heuristics on a large set of CSP instances by pointing out their strengths and limitations.(ii)A methodology to identify suitable and unsuitable regions of the CSP instance space for specific heuristics.(iii)Two algorithm selection strategies with an internal representation of the relation between instances and heuristics which are simple to interpret by humans. These algorithm selectors choose one suitable heuristic according to the features of the instance being solved and the information obtained from the historical performance of the heuristics on similar instances.(iv)The empirical evidence that including more than one heuristic when solving problem instances is not always beneficial for a heuristic selection strategy, as some heuristics may cancel the progress of others if used to solve the same problem at different stages of the search.

This paper is organized as follows. Section 2 introduces relevant works on the analysis of algorithms and algorithm selection through an exploration of the instance space of various combinatorial problems. A detailed description of the heuristics considered for this investigation is provided in Section 3. The analysis of the performance of variable ordering heuristics on the CSP instance space is outlined in Section 4. Section 5 describes the algorithm selection strategies proposed, the results obtained, and the discussion of these results. Finally, we present the conclusion and discuss some ideas for future work in Section 6.

#### 2. Background and Related Work

In general, the task of selecting the most suitable algorithm for a particular problem is referred to as the algorithm selection problem [13] and this concept has been applied to various problems in the past few years. Stützle and Fernandes [14] collected a large amount of quadratic assignment problem (QAP) instances to conduct a systematic study of the performance of some algorithms according to the features of the instances. Smith-Miles [15] proposed a framework for analyzing the performance of various algorithms for QAP instances to get insights into the relationship between instance space features and the performance of the algorithms evaluated. In a subsequent study, Smith-Miles et al. analyzed the performance of heuristics for the scheduling problem by using a decision tree [16]. To conduct the analysis, 75000 scheduling instances were generated and solved by using two common scheduling heuristics. The authors used a self-organizing map to visualize the feature space and the corresponding performance of the heuristics, in order to get insights into the heuristic performance. More recently, Smith-Miles et al. compared the strengths and weaknesses of different optimization algorithms on a broad range of graph coloring instances [17].

Bischl et al. tackled the algorithm selection problem as a classification task based on an exploratory landscape analysis [18]. The authors used systematic sampling of the instances to collect a set of features and used those features to predict a well-performing algorithm (in terms of expected runtime) out of a given portfolio. One-sided support vector regression was used to solve the resulting learning problem. López-Camacho et al. [19] applied principal component analysis as a knowledge discovery method to gain understanding on the structure of bin packing problems and how it relates to the performance of various heuristics for this problem.

With regard to CSPs, Tsang and Kwan [20] introduced the idea of systematically relating instances to suitable algorithms, based on the features of those instances. In that study, the authors presented a survey of algorithms for solving CSPs and established the first ideas that suggested that it was possible to relate the formulation of a CSP to one adequate solving algorithm for that formulation. This idea supports more recent algorithm selection approaches, like the ones described in the following lines. Ortiz-Bayliss et al. [21] studied the performance of two variable ordering heuristics on a large set of CSP instances. In their analysis, the authors found preliminary evidence that supports the idea that some heuristics for CSP can indeed be used in collaborative fashion to improve the search.

A preliminary idea of this investigation was presented by Moreno-Scott et al. [22], where three heuristics were analyzed at a much smaller detail than the one presented in this document. This investigation extends the previous study by including three more variable ordering heuristics into the analysis, formalizing the instance space characterization to help us identify regions of difficult and easy instances, and describing a simple but useful way to use the information from the analysis to predict a suitable heuristic to improve the search.

There is also a growing interest in the generation of particularly difficult or easy instances for testing algorithms, in order to understand when they are preferable to others. Usually, the generation of such instances is done by using evolutionary computation. The idea is to construct generation models that provide a more direct method for studying the relative strengths and weaknesses of each algorithm. Smith-Miles et al. proposed the use of an evolutionary algorithm to produce distinct classes of traveling salesman problem (TSP) instances that are intentionally easy or hard for certain algorithms [23, 24]. In their analysis, a comprehensive set of features is used to characterize the instances. By using the information gathered from the performance of these algorithms on the set of instances, the authors proposed a prediction algorithm that presents high accuracy on unseen instances for predicting search effort as well as identifying the algorithm likely to perform best.

For CSPs, van Hemert [25, 26] proposed a genetic algorithm to produce instances that are difficult to solve. van Hemert’s model maintains a population of binary CSPs of which it changes the structure over time. Its genetic operators modify the conflicts between the pairs of variables. Under this approach, the set of variables and their domains are kept unchanged during the whole process. Thus, only the ratio of forbidden pairs of values can vary as a result of the evolutionary process. As part of the generation process, the algorithm requires solving the instances to evaluate their fitness. Moreno-Scott et al. [22] used van Hemert’s model to generate extremely hard instances for specific variable ordering heuristics. Among their main findings, the authors confirmed that instances that are hard to solve for some heuristics may not be hard for others.

#### 3. Variable Ordering Heuristics

Six dynamic variable ordering heuristics were considered for this investigation due to their performance in previous studies [5, 27–29]. Each heuristic assigns a score to the variables in the instance being solved, based on a specific criterion as the search progresses. According to its particular strategy, every time a variable is to be selected for instantiation the heuristic sorts the variables by their score in ascending or descending order, and the first variable in the sorted list is selected for instantiation. In all cases, ties among variables are broken by using the lexical order of the names of the variables. Regarding the order in which the values of the selected variables are tried, values are always tried in the default order in which they appear in the domain of the variable to instantiate.

Because instantiating a variable changes the problem state (as domains and constraints are updated), the scores given by any of the heuristics to the remaining variables are likely to change at different stages of the search. For this reason, the ordering of the variables is dynamic, deciding which variable to instantiate considering the current problem state and the current scores given by a particular heuristic.

The following lines describe the variable ordering heuristics used in this work:(i)*Minimum Domain (DOM)*. DOM [30] instantiates first the variable with the fewest values in its domain. Then, DOM selects the variable that minimizes among all the variables, where is the current domain size of variable .(ii)*Maximum Degree (DEG)*. This heuristic considers the degree of the variables to decide which one to instantiate before the others. The degree of a variable is calculated as the current number of constraints where the variable is involved. Thus, DEG instantiates first the variable with the largest [31].(iii)*Minimum Domain over Maximum Degree (DOMDEG)*. DOMDEG tries first the variable that minimizes the quotient among the remaining variables in the instance [32].(iv)*Minimum Solution Density (RHO)*. This heuristic is based on the approximated calculation of the solution density of the CSP instance, [5, 28]. Let indicate the constraints in which variable is involved. Then, RHO will instantiate first the variable that minimizes , where is the current fraction of forbidden pairs of values among constraint .(v)*Minimum Expected Solutions (SOL)*. SOL instantiates the variables in such a way that the resulting subproblem contains the maximum number of expected solutions [5, 28]. To do so, the search branches on the variable that minimizes .(vi)*Maximum Conflicts (MXC)*. This heuristic prefers the variable that maximizes the number of conflicts where it is currently involved [33]. A conflict represents a pair of values that is not allowed for two variables at the same time. A constraint between two variables and may contain zero or more conflicts (up to ). The larger the number of conflicts in a constraint is, the more difficult it is to satisfy. MXC will select first the variable that maximizes .

To help clarify how these heuristics make their decisions, a simple CSP instance is depicted in Figure 1 and analyzed by using each heuristic. Table 1 presents the scores given to the variables according to each heuristic. As the reader may observe, there are cases where different heuristics select the same variable. In this example, DOMDEG, RHO, and SOL will instantiate first, while DOM, DEG, and MXC will prefer , , and , respectively.