Mathematical Problems in Engineering

Mathematical Problems in Engineering / 2020 / Article
Special Issue

Meta-Heuristic Techniques for Solving Computational Engineering Problems

View this Special Issue

Research Article | Open Access

Volume 2020 |Article ID 5473501 | https://doi.org/10.1155/2020/5473501

Jose M. Lanza-Gutierrez, N. C. Caballe, Broderick Crawford, Ricardo Soto, Juan A. Gomez-Pulido, Fernando Paredes, "Exploring Further Advantages in an Alternative Formulation for the Set Covering Problem", Mathematical Problems in Engineering, vol. 2020, Article ID 5473501, 24 pages, 2020. https://doi.org/10.1155/2020/5473501

Exploring Further Advantages in an Alternative Formulation for the Set Covering Problem

Guest Editor: Dilbag Singh
Received28 Feb 2020
Accepted01 Jun 2020
Published15 Jul 2020

Abstract

The set covering problem (SCP) is an NP-complete optimization problem, fitting with many problems in engineering. The traditional SCP formulation does not directly address both solution unsatisfiability and set redundancy aspects. As a result, the solving methods have to control these aspects to avoid getting unfeasible and nonoptimized in cost solutions. In the last years, an alternative SCP formulation was proposed, directly covering both aspects. This alternative formulation received limited attention because managing both aspects is considered straightforward at this time. This paper questions whether there is some advantage in the alternative formulation, beyond addressing the two issues. Thus, two studies based on a metaheuristic approach are proposed to identify if there is any concept in the alternative formulation, which could be considered for enhancing a solving method considering the traditional SCP formulation. As a result, the authors conclude that there are concepts from the alternative formulation, which could be applied for guiding the search process and for designing heuristic feasibilit\y operators. Thus, such concepts could be recommended for designing state-of-the-art algorithms addressing the traditional SCP formulation.

1. Introduction

The set covering problem (SCP) is a classical problem shown to be NP-complete by Karp [1] and whose optimization version is NP-hard. Although this is a traditional problem, SCP is widely considered in the current scientific literature because it fits problems in relevant areas, such as engineering, vehicle routing, medical domain, and facilities allocation (see e.g., [26]).

Most contributions in the SCP field considered the traditional SCP formulation introduced by Chvatal [7] and defined as follows. Let be a set of objects and let be a collection of subsets of , where each subset has a non-negative cost associated. Then, the purpose of SCP is to get a minimum cost family of subsets , such that each element of belongs to at least one subset of the family . This traditional formulation does not directly deal with two aspects: solution unsatisfiability and set redundancy. The solution unsatisfiability aspect is related to the possibility of generating unfeasible solutions during the search. The set redundancy aspect is related to the possibility of generating nonoptimized solutions in cost, including redundant components (subsets). The noninclusion of these two aspects in the formulation means that the solving method has to address them to ensure good performance.

In the last years, Bilal et al. [8] proposed an alternative SCP formulation. Its main contribution was that both redundant sets and unfeasible solutions were directly penalized in the fitness function. Therefore, the solving method does not have to control such aspects in contrast to the traditional SCP formulation. Nevertheless, the contribution of the alternative formulation becomes questionable due to two main issues. First, Vasko et al. [9] demonstrated that the calculation effort to remove redundant components from an SCP solution is almost negligible. Thus, including a redundancy removal operator in a solving method addressing the traditional formulation does not increase excessively the computational cost. Second, there are simple methods for transforming unfeasible solutions into feasible ones, such as the one proposed by Beasley and Chu [10]. As a consequence, alternative formulation seems not to be advantageous.

Analysing the work proposed in the alternative formulation in Bilal et al. [8], they compared the alternative formulation to the traditional one. To this end, they solved the standard Beasley’s OR library [11] through two algorithms: a simple descent heuristic (DH) addressing the alternative formulation and a standard greedy heuristic (GH) addressing the traditional one. As a result, DH outperformed GH, which is valuable for justifying the alternative formulation. However, Vasko et al. [9] later applied the same GH using the traditional formulation on the same instances, but included a simple redundancy removal operator, obtaining better results than the ones shown in Bilal et al. [8] for DH. Once again, the alternative formulation seems to have questionable merit. However, this comparison initiated in Bilal et al. [8] might have some limitations:(i)The heuristic techniques considered might not be the most appropriate according to the current state of the art, in which metaheuristics, especially swarm intelligence algorithms (SIAs), provide the best results in general.(ii)The authors independently compared the two formulations using different algorithms. This focus could be correct because the algorithm best suited for a formulation could be very different, even in type, from the algorithm for the other formulation. However, there is no study combining aspects from the two formulations that could provide some advantages for the same solving method.(iii)The authors did not consider any statistical method for comparing both formulations. Instead, they only compare the average solution obtained.

On this basis, this paper questions whether there is any advantage from the concepts involved in the alternative formulation, beyond the novel problem formulation. This idea leads us to propose two studies focused on solution quality as a way to know if there is any concept in the alternative formulation, which could be considered for enhancing a method using the traditional formulation. To this end, the authors select two different metaheuristics adequate for the studies, although other metaheuristics could have been selected without loss of generality.

The only demonstration of the concept utility from the alternative formulation is valuable. This means that future solving methods could include these novel concept. This research focus implies that the authors are not focused on getting the best absolute results solving Beasley’s OR library, but they understand that the solutions obtained should be reasonable, as will be discussed later.

The first study focuses on identifying if there is any concept in the alternative formulation, which could be applied for guiding the search process of a solving method addressing the traditional formulation. To this end, the authors generate two versions of the same SIA addressing the traditional formulation. In the first version, (i), the search process of SIA is guided by concepts from the traditional formulation. In the second version, (ii), the search process of SIA is guided by concepts from the alternative formulation. Further details of this first study are as follows:(i)This study requires a solving method, whose search process is closely linked to the optimization problem. The ant colony optimization (ACO) algorithm meets this requirement, being very sensitive to the heuristic information operator designed based on the problem to solve. Thus, two heuristic information operators are considered: in (i), a usual operator based on the traditional formulation and in (ii), a novel operator based on concepts from the alternative formulation.(ii)The two ACO approaches in (i) and (ii) include the same usual operator for removing redundant sets. No operator for transforming unfeasible solutions into feasible ones is considered because ACO does not generate them.(iii)The two ACO approaches in (i) and (ii) are applied for solving Beasley’s OR library. The results obtained were analysed through a widely accepted statistical method. Both approaches were tuned through the automatic method iterated F-Race, preventing errors from a manual method [12].

The second study focuses on identifying if there is any concept in the alternative formulation, which could be considered for designing the operators needed for transforming unfeasible solutions into feasible ones while removing redundant columns. Note that this type of operators is widely applied in most SIAs solving SCP. To this end, the authors generate two versions of the same SIA addressing the traditional formulation. In the first version, (iii), SIA considers the widely applied operator proposed in Beasley and Chu [10]. In the second version, (iv), SIA considers a novel operator inspired by concepts from the alternative formulation. Further details of this second study are as follows:(i)The second study requires a solving method, which could generate unfeasible solutions. The artificial bee colony (ABC) is one of the many metaheuristics meeting this requirement. Thus, two different feasibility operators are considered in (iii) and (iv).(ii)The two ABC approaches in (iii) and (iv) are applied for solving Beasley’s OR library. The results obtained were analysed through a widely accepted statistical method. In this case, the parameter configuration is taken from the literature (see [13]) because the feasibility operator is considered as an external tool.

To summarize, the motivation of this research is to identify if there is any concept from the alternative formulation, which could be used to solve the traditional SCP problem. To the best of our knowledge, this is the first work performing this research. Figure 1 summarizes the main tasks performed in the two studies. The contributions to the field are as follows:(i)A first study is proposed to identify concepts of interest in the alternative SCP formulation, which could be applied for guiding a search method addressing the traditional SCP formulation. The study results in that the gain concept in the alternative SCP formulation is useful for guiding the search, outperforming the results obtained by a usual heuristic information operator from the literature.(ii)A second study is proposed to identify concepts of interest in the alternative SCP formulation, which could be considered for designing feasibility operators. The study results in that the gain concept in the alternative SCP formulation is useful for designing this type of operators, outperforming a usual operator in the literature. This contribution is especially interesting because this type of operators is widely applied in the metaheuristic SCP field as a black-box method.

These two contributions are especially interesting for future works, implementing techniques for solving the traditional SCP formulation. From the first study, it is shown that the gain concept in the alternative SCP formulation is useful for guiding the search. That means that this concept could be considered during the design of novel solving methods for SCP. From the second study, it is shown that the gain concept is useful for designing feasibility operators. That means that this concept could be considered to improve techniques already shown to be useful for solving the problem, as well as proposing novel feasibility operators. As stated before, the second future scope line is especially interesting because feasibility operators are widely applied in the literature as black-box techniques outside the solving method.

The rest of this paper is structured as follows. Section 2 discusses related work. In Section 3, a formal statement of both SCP formulations is provided. In Section 4, the main aspects of the ACO algorithm in the first study are discussed. Section 5 discusses the main aspects of the ABC algorithm in the second study. In Section 6, the experimental methodology followed is discussed and the solution quality results are analysed. Finally, Section 7 concludes and introduces future works. Table 1 includes a summary of the notation considered throughout this work.


Cardinal of a given set.
Relative importance of pheromone trails, .
The set of columns that row covers, .
Binary matrix of m-rows and n-columns. The rows are the elements of the universe and the columns are the subsets .
Percentage of columns to be added during the generation of a solution in ABC.
Percentage of columns to be removed during the generation of a solution in ABC.
Heuristic factor of column at step for the alternative formulation, .
Value in the cell of . It equals 1 if the -th column covers the -th row and 0 otherwise, , .
Point/points in which a function gets its maximum value/values.
Point/points in which a function gets its minimum value/values.
The sum of the gains of covering the noncovered rows which could be covered by column at .
Relative importance of heuristic information, .
Set of costs, .
Heuristic factor of column at step for the classical formulation, .
Cost associated to the -th column, .
Cost of the cheapest set among the sets covering , , .
Number of noncovered rows which could be covered by column at step .
Solution generated by an ant at step , .
A solution to the problem, .
Ratio coefficient, .
Heuristic factor of column at step , .
Universe of elements, .
-th element of , , .
Average fitness evaluations needed for reaching the best solution found during the exploration.
Fitness function of the classical SCP formulation, .
Fitness function of the alternative SCP formulation, .
Gain of covering an element , , .
Row set, .
Row set covered by column , .
Percentage of improvement by considering the alternative approach of the algorithm instead of the default version.
Column set, .
The best solution found from the beginning of the algorithm, .
Threshold value based on trials for deciding if a worker bee is transformed into a scout one in ABC, .
Cardinal of or number of rows.
Cardinal of or number of columns.
Number of worker bees in ABC, .
Amount of pheromone put on column , .
Very small positive constant, .
Population size in ACO.
Population size in ABC.
Landscape solution quality metric, .
Random number uniformly distributed in .
Parameter determining the relative importance of exploitation versus exploration, .
Pheromone persistence, .
Set of unselected columns in , , .
Average RPD from a distribution, .
Average RPD of algorithm , with .
Average RPD of algorithm , with .
Average RPD of algorithm , with .
Average RPD of algorithm , with .
Minimum RPD from a distribution, .
Maximum RPD from a distribution, .
Set of subsets, , .
-th subset of , , , .
Landscape rate of success, .
Landscape speed of reaching a solution, .
Pheromone trail of column , .
Construction step in ACO, .
Average execution time of algorithm , with .
Average execution time of algorithm , with .
Average execution time of algorithm , with .
Average execution time of algorithm , with .
Number of uncovered rows in a solution, .
Set of uncovered rows in , , .
Number of columns in a solution that cover the row , .
An SCP solution expressed as binary vector, .
-th element of . It equals 1 if is part of the solution and 0 otherwise, .
Variable equaling 1 if the element is covered in and 0 otherwise, .
Column provided by SROM at step , .
Average cost obtained from a distribution, .
Maximum cost obtained from a distribution, .
Minimum cost obtained from a distribution, .
Optimum solution of a given instance, .
Column selected at step , .

The literature about SCP is extensive. Some authors considered exact algorithms, such as branch-and-bound and branch-and-cut techniques [1416] or linear programming [1719]. More recently, Caprara et al. [20] compared several exact algorithms, concluding that the best exact technique was CPLEX.

It is well known that exact techniques require excessive computer resources on large problems. Therefore, much effort was focused on exploring heuristic and metaheuristic algorithms, which could find near-optimal (or even optimal) solutions for large problems in reasonable computing time.

Starting from heuristic methods, Chvatal [7] applied a classical GH. Although GHs are simple and fast to implement, they seldom produce good quality solutions. Some researchers tried to improve GHs by adding randomness (see e.g., [9, 2124]). Highly sophisticated heuristics based on Lagrangian relaxation were also considered, yielding very good solutions (see e.g., [20, 2527]). From this brief review, it is shown that the number of proposals considering heuristic methods is limited for SCP in the last years. Note that in other optimization problems, the proposal of heuristics is usual (e.g., [28]). This situation is opposite for metaheuristics, acquiring a great relevance during the last decades [29]. Thus, metaheuristics were applied to fields as networking [30], biological ontology alignment [31], shop scheduling [32], chemical analysis [33], and image encryption [34].

Metaheuristics combine effectiveness exploring the search space and basic heuristic methods. Such techniques are usually split into three large groups: evolutionary algorithms (EAs), trajectory algorithms (TAs), and SIAs. To solve SCP, some authors considered EAs (see e.g., [10, 3537]). Other authors applied TAs (see e.g., [38, 39]). However, the most widely applied metaheuristics for solving SCP are SIAs. Some examples are the artificial bee colony (ABC) algorithm [13, 40, 41], the ant colony optimization (ACO) algorithm [4244], the firefly algorithm (FA) [45, 46], the teaching-learning-based optimization (TLBO) algorithm [47, 48], the electromagnetism-like (EM-like) algorithm [49, 50], the shuffled frog leaping algorithm (SFLA) [51], the fruit fly optimization algorithm (FFOA) [52], the cuckoo search algorithm (CSA) [53, 54], the cat swarm optimization (CSO) algorithm [55, 56], the jumping particle swarm optimization (JPSO) method [57], the black hole optimization [54, 58], and the monkey search algorithm [59].

Analysing the previous contributions according to the results obtained, exact algorithms provided excellent results, solving reduced SCP problems. Focusing on larger SCP problems addressed by approximate techniques (heuristics and metaheuristics), the authors check that heuristics do not provide as good results as the more sophisticated metaheuristics. Thus, the best results were usually obtained by SIAs. In this line, we should mention the valuable contributions of Naji-Azimi et al. [48, 49] and Balaji and Revathi [57] who got optimal or near-optimal solutions for classical SCP benchmarks.

All the works listed before have in common that they considered the traditional SCP formulation. On the contrary, the alternative formulation received limited attention. As far as the authors know, there are only two works considering the alternative formulation. In Bilal et al. [60], they solved an SCP variant through an iterated tabu-search metaheuristic. In Crawford et al. [61], they compared the results obtained solving the traditional and alternative formulations through the ACO algorithm.

The research presented in this paper was inspired by a very preliminary work discussed before (see Crawford et al. [61]). In this contribution, there is no study regarding the existence of concepts in the alternative formulation, which could be considered for solving methods addressing the traditional formulation. In Lanza-Gutierrez et al. [56], the authors applied an SIA to solve SCP by a CSO algorithm but with a completely different approach.

3. Set Covering Problem Statements

Let and be the row and column sets, respectively. Let be a universe of elements and let be a collection of subsets of , such that and , with . Each subset has a non-negative cost associated , where .

The optimization problem is formally defined by assuming a binary matrix of -rows and -columns, where the rows are the elements of the universe and the columns are the subsets. Let be the value in the cell of given byfor and , where . Thus,

The objective of SCP is to find a subset of S covering (containing) all the elements of at a minimal cost. A solution to SCP is usually expressed as a binary vector , where

Then, the cost of the solution is

Next, we give a formal statement of the two SCP formulations.

3.1. Traditional Formulation

The SCP fitness function is

Then, given elements and subsets, the objective is to find a collection of subsets tosubject to

The constraint in equation (7) ensures that each row is covered by at least one column. If this constraint is not satisfied, the solution is considered unfeasible. The constraint in equation (8) is only for the integrity of the mathematical programming. Hence, this equation does not need to be addressed as a constraint in heuristic approaches.

3.2. Alternative Formulation

In this formulation, covering an element is identified with collecting a gain at a given cost. Let be the cost of the cheapest set among the sets covering the element given bywhere provides the point/points in which a function gets its minimum value/values. Then, the gain of covering an element iswhere is a very small positive constant. Based on this gain concept, the SCP fitness function iswhere

Then, given elements and subsets, the objective is to find a collection of subsets tosubject to

The constraints in equations (14) and (15) are only for the integrity of the mathematical programming. According to this formulation, there are no unfeasible solutions as happens with the traditional formulation. Note that unfeasible solutions still exist for the problem. However, the alternative formulation penalizes such issue instead of discarding the solution. Moreover, it also penalizes directly redundant sets beyond having a higher cost as occurs for the traditional formulation. Thus, the use of redundancy removal operators is not needed, in contrast to the traditional formulation, where it is highly recommended.

4. Ant Colony Optimization

The ACO algorithm is inspired by ant colony behaviours. The ACO process is focused on the search of the optimal path in a graph based on an artificial ant colony. Thus, ants work cooperatively and communicate through heuristic information depending on the problem and pheromone trails. Pheromone trails are a type of distributed information, which is dynamically updated by the ants. Pheromones keep the experience gained during the search process while remarking promising areas of the search space.

Let be the solution generated by an ant at construction step . Let be the set of uncovered rows in . Let be the set of unselected columns in . Reviewing the scientific literature [42, 62], a usual heuristic information expression for a column at step iswhere is the number of noncovered rows in , which could be covered by column at step . This value iswhere is the cardinal of a set and denotes the row set covered by column , for and .

In this work, we propose a heuristic information inspired by the gain concept from the alternative formulation introduced in Section 3.2 aswhere is the sum of the gains of covering the noncovered rows in by column at step . Thus,where is given in equation (10).

To simplify the notation, we define the heuristic information for a column at step based on whether we consider the traditional formulation or the alternative one. That is,

Algorithm 1 shows the procedure of a general ACO. Next, the main steps are detailed.(1)Initialization: in the beginning, we propose to preprocess the SCP instances by using column domination and column inclusion [18]. Next, the algorithm parameters are initialized. Traditionally, ACO algorithms do not include an initialization step to generate the solutions in the population. Instead, pheromone trails are randomly assigned and then solutions are generated according to this random information. That means that the algorithm could need to run some iterations before having the right information about the solution component quality. At this point, we propose to include a greedy population initialization step in ACO based on Lu and Vasko [48]. This step corresponds to line 1 of Algorithm 1.(2)Solution construction method: each ant starts with an empty solution where columns are added iteratively until all rows are covered. Consequently, this strategy causes all solutions generated to be feasible. Most ACO-based algorithms consider a similar state transition rule, preferring solution components with high pheromone and heuristic values (see e.g., [42, 62]. A possible way to generate solutions is the single row oriented method (SROM) proposed by Ren et al. [43]. In that work, it was demonstrated that SROM reduces the computation burden compared to other methods. Thus, SROM is used in this paper as the solution construction method. Additionally, we also consider the ant colony system (ACS) proposed by Dorigo and Gambardella [63] as an extension of the ACO algorithm. ACS includes a pseudo-random-proportional rule, providing a direct way to balance between exploration and exploitation during the selection of the solution component. If denotes the column selected at step , then the ACS rule iswhere is a random number uniformly distributed in , is a parameter determining the relative importance of exploitation versus exploration, is the column provided by SROM at step , and provides the point/points in which a function gets its maximum value/values. Thus, if , then it returns the nonselected column having the highest value of at step , where denotes the pheromone trail of column and and denote the relative importance of pheromone trails and heuristic information, respectively. This step corresponds to line 4 of Algorithm 1.(3)Local search: it is well known that local search is effective to improve ACO performance. We consider the local search proposed by Ren et al. [43], where for each column in , the algorithm determines if the column should be removed or replaced by one or more columns while keeping solution feasibility. This step corresponds to line 5 of Algorithm 1.(4)Update pheromone trails: we consider that pheromone trails are updated based on the max-min ant system (MMAS) approach proposed by Stützle and Hoos [64]. In this method, after each ant generates a full solution, all pheromone trails are decreased uniformly to simulate evaporation, forgetting part of the historical experience. Next, a small amount of pheromone is deposited on the columns corresponding to the best solution found. To this end, MMAS considers the best solution found in the current iteration, instead of the best solution found from the beginning of the algorithm. We opted for the second option as did Ren et al. [43]. Thus, the search can concentrate fast around the best solution found. This strategy could result in a bad performance if the algorithm is trapped in bad solution areas. However, this risk is reduced due to the ACS strategy detailed in Step 2. Formally, pheromone trails are updated followingwhere is the pheromone persistence and is the amount of pheromone put on column provided in Stützle and Hoos [64]. Additionally, they also proposed that the range of pheromone trails is in , where denotes a ratio coefficient and is the best solution found from the beginning of the algorithm, . This step corresponds to line 7 of Algorithm 1.

Initialize
while not stop condition do
  for each ant in the nest do
   generate a new solution
   apply local search to the new solution
  end for
  update pheromone trails
end while
return the best solution found

5. Artificial Bee Colony

The ABC algorithm is inspired by honey bee behaviours, the search process being guided by three types of artificial bees: workers, onlookers, and scouts. The general procedure of ABC is shown in Algorithm 2. It starts by generating an initial population of solutions. For every row, a random column with covering possibilities is selected until all rows are covered. Next, along iterations, the population is managed by workers and onlookers, which are randomly recruited in each iteration. The behaviour of each bee is as follows:(i)A worker takes a random solution from the population to generate a new solution by adding a random number of columns between 0 and (in percentage) of columns in the SCP instance. This step is followed by an elimination of random columns between 0 and (in percentage) of columns in the SCP instance. The fitness value of the individual generated by the worker is obtained. In the case that the fitness value of the new individual is better than the previous individual assigned to the worker, then the new individual replaces the previous one. In the opposite case, the counter is increased for the number of trials for improving the current solution. Otherwise, the counter is set to zero. If such counter reaches the threshold, the worker is transformed into a scout bee.(ii)An onlooker generates a new solution following a similar procedure as for workers, but selecting the solution with probability to its quality, instead of randomly. The concept of the threshold is not used in onlookers.(iii)A scout discards its current solution and generates a new one by following the same strategy as for generating the initial population. As expected, the counter of trials is initialized to zero.

Generate initial solutions
while not stop condition do
  Recruit worker bees
  Recruit onlooker bees
  recruit scout bees if needed
  update the best solution found
end while
return the best solution found

As both workers and onlookers can generate unfeasible solutions because of the random elimination of columns, it is mandatory to manage this issue. Crawford et al. [13] proposed to consider the usual heuristic by Beasley and Chu [10] for transforming unfeasible solutions into feasible ones while reducing the cost of the solution in a later step. This heuristic is shown in Algorithm 3. Here, the first stage in lines transforms an unfeasible solution into a feasible one. The second stage in lines removes redundant columns. In this algorithm, note that , , and are the set of columns in a solution, the set of columns that row covers, and the number of columns in that cover the row , respectively.

initialize
initialize
for each row in increasing order of do
  find the first column in minimizing equation (23)-default or (24)-proposal
  add to
  set , and
end for
for each column in decreasing order of do
  ifthen
   set and
  end if
  end for
return , which is a feasible SCP solution, without redundant columns

Focusing on the first stage, the steps required to make a solution feasible include the identification of uncovered rows and the addition of columns to the solution so that all rows are covered. The search for the missing columns in the proposal of Beasley and Chu [10] is guided bywhere is the number of noncovered rows in the solution, i.e., the ratio between the cost of a column and the number of noncovered rows, which could be covered by such column.

As an alternative strategy, this paper proposes to guide the search based on the concepts from the alternative SCP formulation, that is,i.e., the ratio between the cost of a column and the sum of the gains of covering the noncovered rows by such column.

6. Experimentation

This section discusses the experimental methodology and analyses the results obtained in the first and second studies.

6.1. Experimental Methodology

We apply the two approaches in each study for ACO and ABC algorithms to solve Beasley’s OR library. This dataset is widely used to report empirical results in the current literature (see e.g. [9, 40, 48]). This library includes 65 non-unicost instances generated randomly, as detailed in Table 2. For further details about the random generation of these instances, see [18, 65]. For each instance in the library, the number of rows, the number of columns, and the cost of each column are provided. Additionaly, for each row, the number of columns that covers and also the list of columns which cover that row are also provided. For a complexity study of the search space in this benchmark, we refer readers to the work by Finger et al. [66], which considered the fitness-distance correlation landscape metric to this end. In Table 2, “Density (%)” contains the percentage of in the matrix in equation (2). “Optimal solution” shows two possible values, known and unknown, according to whether the instances have a solution tested to be optimal, or instead it could not be checked because of problem complexity. Thus, we only know the best historical solutions found for the sets nrg and nrh.


Instance setNumber of instancesmnCost rangeDensity (%)Optimal solution

4102001,000[1, 100]2.00Known
5102002,000[1, 100]2.00Known
652001,000[1, 100]5.00Known
a53003,000[1, 100]2.00Known
b53003,000[1, 100]5.00Known
c54004,000[1, 100]2.00Known
d54004,000[1, 100]5.00Known
nre55005,000[1, 100]10.00Known
nrf55005,000[1, 100]20.00Known
nrg5100010,000[1, 100]2.00Unknown
nrh5100010,000[1, 100]5.00Unknown

We combine two stop condition criteria for performing the experimentation: reaching a given number of fitness evaluations or getting the optimal solution. If at least a condition holds, the algorithm ends. For ACO, we assume 10,000 fitness evaluations as a stop condition. As we will discuss later, this value is enough for performing the experimentation. For ABC, we consider 500 iterations based on Crawford et al. [13].

Before running the experimentation, we should configure both algorithms. In the case of ABC, we can assume the parameters provided in Crawford et al. [13] for the two approaches of ABC considered here because (i) the authors also solved SCP and (ii) the approach based on the alternative formulation only modifies the heuristic operator for solution feasibility and the operators guiding the search are not modified. In the case of ACO, we should configure the two approaches of ACO considered here because (i) we do not have any set of parameters from previous works for the approach of ACO used and (ii) the approach based on the alternative formulation modifies how the search is performed in comparison to the traditional one, and then we should configure the two approaches independently.

Thus, for the first study, we consider , , , , and of Crawford et al. [13]. For the second study, we get the parameters of the two ACO approaches using F-Race. This method configures a metaheuristic starting with a set of candidate values for each parameter. Then, it discards bad performance configurations as soon as statistically sufficient evidence is reached against them, focusing on the most promising ones.

Concretely, we consider the iterated F-Race implementation for R software by López-Ibáñez et al. [67]. Following the authors’ recommendations, we divided the benchmark into three groups according to the problem size () to get a consistent configuration. Thus, Group A includes instance sets 4, 5, and 6; Group B includes instance sets a, b, c, and d; and, finally, Group C includes instance sets nre, nrf, nrg, and nrh. Table 3 shows the candidate values for each parameter based on previous works [43] and the configurations obtained for each group and ACO approach. Note that d-ACO denotes ACO with the traditional heuristic information expression and n-ACO denotes ACO with the alternative heuristic information expression.


ParameterCandidate valuesd-ACOn-ACO
ABCABC

0.25, 0.50, 1.00, 2.00, 5.00, 8.001.001.001.001.001.001.00
1.00, 2.00, 5.00, 8.00, 10.00, 13.008.008.005.008.008.008.00
0.80, 0.85, 0.90, 0.95, 0.98, 0.99, 0.995, 0.9990.950.900.900.980.990.90
0.10, 0.25, 0.50, 0.75, 0.90, 0.95, 0.990.500.750.500.900.900.25
0.0001, 0.0005, 0.001, 0.002, 0.005, 0.0100.0010.0050.0050.0050.0050.001
5, 10, 20, 50, 100, 150, 200, 250201010015015050

Once both ACO approaches are configured, 30 independent runs are performed for each instance and algorithm. Next, we analyse if there are significant differences between the behaviour of the two algorithms regarding solution quality and execution time for each instance. To this end, the authors consider the Wilcoxon–Mann–Whitney test [68] to validate several hypotheses. The implementation of this test is the one provided in the assessment performance tool described in Knowles et al. [69] and available in Fonseca et al. [70].

However, indirectly the RPD metrics used for the assessment evaluation consider the optimal solution for the instances, which can be considered as the solutions provided by the corresponding exact techniques. Thus, the RPD metric evaluates how far the solution found by the metaheuristic is from the optimal solution provided by an exact technique.

As a solution quality metric, we consider the relative percentage deviation (RPD), which evaluates how far the solution found by the metaheuristic is from the optimal solution known in the literature. The lower the RPD value, the better the solution obtained. Thus, indirectly, this metric evaluates the performance of the technique in comparison with the corresponding generic exact technique solving the same problem instance. Three RPD metrics are included: the average RPD, ; the minimum RPD, ; and the maximum RPD, . They are calculated aswhere , , and denote the average solution cost, the minimum solution cost, the maximum solution cost from a distribution of 30 samples solving a instance, respectively, and is the optimum solution cost of the instance. Note that the cost of the best solution found during one run is given by equation (4). Thus, although both SCP fitness formulations are different, we can compare the results obtained without loss of generality.

Regarding the computing platform considered to perform the experimentation, the authors used two computing nodes in a computing cluster. Each node has two 2.33 GHz Intel Xeon E5410 with four cores each and a 1600 MHz DDR3 16 GB RAM, running a Linux operating system. All executions were performed in a single core without parallelism because the goal of this paper is not to explore parallelism. The reason for considering such unconventional infrastructure is the possibility of performing many independent executions because of the needs for the statistical test required to validate the proposal. That means that for a single execution or a reduced set of them, a conventional computer could be considered. To avoid the operating system tasks affecting the total computing time obtained during the experimentation, one core in each computing node was idle. Additionally, the authors also checked that the RAM in the computing node was enough to not apply memory swap. As expected, the computing power capacity of the processor definitively affects the time required to find the solution to the problem, most of the operations being related to CPU computing and accessing the principal memory (RAM). Note the same computing nodes are considered for all the experiments in this work to not bias the conclusions reached regarding computing time.

Regarding programming languages, ACO algorithm was fully implemented in Java for Java Development Kit (JDK) 1.7. ABC algorithm was fully implemented in C. The scripts for managing the executions and collecting the results were implemented in bash. Note that the usage of two different programming languages for implementing ABC and ACO does not affect the conclusions reached in computing time. This fact is because ABC and ACO computing times are not compared in this work.

6.2. Analysis of the Experimental Results

Tables 4 and 5 show for each instance and case study, the RPD metrics (, , ), average execution time reaching the stop condition (), and average fitness evaluations needed for reaching the best solution found during the exploration (). In both tables, lower and values are given in bold for each instance. In Table 5, d-ABC denotes ABC with the default heuristic feasibility operator and n-ABC denotes ABC with the heuristic feasibility operator based on the alternative SCP formulation.


Instanced-ACOn-ACO
(%) (%) (%) (%) (%) (%)

4_142918808450.230.230.2313501090.000.000.00
4_251239509000.200.000.5927002240.000.000.00
4_351616009160.970.191.1642007040.190.000.78
4_449448508800.400.200.8121008290.200.200.20
4_551219506800.100.000.396003400.000.000.39
4_656037805740.000.000.3626252400.000.000.00
4_743014308580.700.001.1614251180.000.000.00
4_849224404150.000.000.201350970.000.000.00
4_964151209321.250.312.3455508480.310.000.78
4_1051413901150.000.000.00750600.000.000.00

5_125348608910.400.001.5830005260.000.001.19
5_230249709451.490.662.3227008701.320.331.99
5_322620807530.220.001.3317251250.000.000.00
5_424227802450.000.000.0015001100.000.000.00
5_521111608570.470.470.477502230.000.000.47
5_6213640590.000.000.00300250.000.000.00
5_729354909010.510.341.0215758370.340.340.34
5_828838206530.350.000.6916501220.000.000.00
5_9279208230.000.000.36975780.000.000.00
5_1026523706170.380.000.7513501120.000.000.00

6_113830508021.450.001.4521753320.000.001.45
6_214611801220.000.000.0020251640.000.000.00
6_314515301280.000.000.00900700.000.000.00
6_4131930700.000.000.00300220.000.000.00
6_516111308311.240.002.4820253780.000.001.86

a_125327259040.790.790.7948759760.400.400.40
a_225247009081.191.191.1930009670.400.000.79
a_323229259030.430.430.4357009710.430.430.43
a_423415758950.430.430.4331504440.000.000.43
a_523618159020.420.420.4236754780.000.000.85

b_169360450.000.000.00450450.000.000.00
b_276370370.000.000.00900930.000.000.00
b_380320340.000.000.00300330.000.000.00
b_479380380.000.000.00450440.000.000.00
b_5721010.000.000.00150160.000.000.00

c_122726909731.320.881.76622510340.880.880.88
c_221927809790.910.910.91397510560.910.910.91
c_324333059931.650.822.47577510670.820.411.23
c_421924002350.000.000.0039003800.000.000.00
c_521513059790.470.000.93292510520.470.470.47

d_16041511741.670.001.6720254730.000.000.00
d_26616752360.000.000.0018003500.000.000.00
d_37277511981.391.391.3997510191.390.001.39
d_462235250.000.000.00150200.000.000.00
d_561600630.000.000.00675750.000.000.00

nre_129100290.000.000.0050180.000.000.00
nre_23016509020.000.003.3314004230.000.000.00
nre_3279002160.000.000.007001700.000.000.00
nre_42812004520.000.003.5710002600.000.000.00
nre_528200490.000.000.00100310.000.000.00

nrf_1142001180.000.000.002001120.000.000.00
nrf_215100640.000.000.0075400.000.000.00
nrf_31410054647.140.007.145049667.140.007.14
nrf_4145002710.000.000.002001220.000.000.00
nrf_51310055117.697.697.6910056147.697.697.69

nrg_1176575019050.280.002.27535012500.000.000.00
nrg_2154480021661.951.951.95342521711.300.651.95
nrg_3166435022533.613.014.82277522533.012.414.22
nrg_4168525022072.681.193.57410022172.381.192.98
nrg_5168645022580.600.600.60550018300.000.000.60

nrh_163365054993.171.594.76352557683.173.173.17
nrh_263330054373.173.173.17210057063.173.173.17
nrh_359420055113.393.393.39275057763.393.393.39
nrh_458360054343.451.723.45220056643.451.723.45
nrh_555425023290.000.000.00297517080.000.000.00


Instanced-ABCn-ABC
(%) (%) (%) (%) (%) (%)

4_1429125,8084850.820.230.93124,1224470.230.230.23
4_25129998930.000.000.008194760.000.000.00
4_3516560273900.000.000.1911,1911030.000.000.00
4_4494128,3995350.200.000.61126,2094880.200.200.20
4_551211,1611020.000.000.00118,9613540.390.000.39
4_6560127,9885370.360.000.5428,4282540.000.000.36
4_7430110,9222070.930.001.16110,2241940.700.470.70
4_8492126,1174910.200.200.2034,2922830.000.000.20
4_9641128,3225260.940.002.18128,2765230.470.000.78
4_1051418,1621640.000.000.0055,6703330.000.000.58

5_1253125,02017970.400.001.58123,79217060.400.000.40
5_2302127,19319561.991.662.98125,85918560.990.001.32
5_3226123,81217131.331.331.33123,62716960.880.880.88
5_424217,7516500.000.000.0067602390.000.000.00
5_521114,9455210.000.000.0079272770.000.000.00
5_621319,2996840.000.000.0055752000.000.000.00
5_7293126,40119240.340.341.0214,7535470.000.000.00
5_828817,1536520.000.000.0023,5348140.000.000.00
5_9279124,77017540.360.360.3689613160.000.000.00
5_1026586,53911490.190.001.1312,4314070.000.000.00

6_1138124,41711981.450.002.1791,2699160.360.001.45
6_214687842170.000.000.0063811560.000.000.00
6_314514,6993590.000.000.0047761150.000.000.00
6_413116,9284000.000.000.003961950.000.000.00
6_516113,7603380.000.000.0043771060.000.000.00
a_1253123,57357290.400.400.79122,22253810.400.400.40
a_2252126,01663091.390.792.38126,21263460.790.001.19
a_3232126,10363341.290.861.72125,82661890.430.000.86
a_4234125,90862870.430.430.43124,89760040.430.430.43
a_5236124,78559480.420.420.4211,14413460.000.000.00

b_169120,83213,0121.450.001.45397012280.000.000.00
b_276696220540.000.000.00337210430.000.000.00
b_380417012960.000.000.00357211060.000.000.00
b_479616619320.000.000.00396512270.000.000.00
b_57231729880.000.000.0027748570.000.000.00

c_1227121,97812,5890.880.441.76116,55194310.880.880.88
c_2219110,30913,8990.230.001.3721,66464920.000.000.00
c_3243126,72815,2461.650.412.47126,01814,8810.820.002.06
c_4219126,19614,9510.460.001.8328,35576580.000.000.00
c_521519,07554340.000.000.0014,54640860.000.000.00

d_160597044550.000.000.00757256030.000.000.00
d_266121,27231,5551.521.521.5211,34383710.000.000.00
d_372122,82934,0631.391.392.7818,80813,5100.000.000.00
d_462121,97632,3831.611.611.61476835100.000.000.00
d_561109,45714,0771.641.641.64509036900.000.000.00

nre_12927,68468,0660.000.000.00347996340.000.000.00
nre_230119,319111,2136.676.676.6718,66651,9820.000.003.33
nre_327120,070115,0287.417.417.4119,33953,4700.000.000.00
nre_428119,063109,81210.713.5714.2910,55930,0550.000.000.00
nre_528118,585106,7813.573.577.14405311,2030.000.000.00

nrf_114119,200230,41714.2914.2914.29235313,7410.000.000.00
nrf_215119,831238,70513.3313.3313.33195011,3370.000.000.00
nrf_314119,332236,61114.2914.2921.43392923,1230.000.000.00
nrf_414119,910241,66614.2914.2914.29274616,0740.000.000.00
nrf_513106,58079,14115.3815.3823.08120,731248,0337.697.697.69

nrg_1176128,776257,6642.842.273.41126,471235,6361.140.571.70
nrg_2154128,678253,5643.252.603.90126,160233,1681.951.302.60
nrg_3166134,516243,5673.613.014.22124,477217,7812.411.813.01
nrg_4168135,789223,4563.572.984.17125,268224,7571.791.192.38
nrg_5168126,782227,3464.173.574.76126,077232,0472.382.382.98

nrh_163126,421228,3263.173.174.76145,678253,4671.591.593.17
nrh_263134,282217,4563.174.764.76155,671243,4571.591.593.17
nrh_359123,481237,2223.395.085.08132,451233,2451.691.693.39
nrh_458143,562242,4563.453.455.17142,457231,0021.721.723.45
nrh_555134,612232,2163.643.643.64143,417224,4521.821.823.64

Analysing both tables regarding RPD metrics, we check that (i) n-ACO seems to outperform or match d-ACO in most instances and (ii) n-ABC seems to outperform or match d-ABC in most instances. Focusing on computing times, we reach a similar behaviour, where n-ACO appears to need a shorter time than d-ACO, except for b, c, and nrh instances, and n-ABC appears to need a shorter time than d-ABC in general. Focusing on evaluations, the field is related to the number of iterations reached as follows. In ACO, evaluations are performed for each iteration. In ABC, a number of evaluations varying between and two times are performed for each iteration. Thus, for ABC, the maximum number of evaluations will be a value in the range . For ACO, the maximum number of evaluations will be 10,000 as defined before. Analysing the field, the authors reach that the number of evaluations needed is distant from the stop condition defined, and then the stop condition is adequate in both studies.

Table 6 shows the average RPD metrics for each instance group, where “ipv” field denotes the percentage of improvement by considering the alternative approach of the algorithm instead of the default version. Analysing this table, it is observed that (i) n-ACO provides better RPD values than d-ACO for all the groups and (ii) n-ABC also provides better RPD values than its default version. The RPD metrics obtained are in line with other works from the literature, with RPD values lower than 1.0%. In this regard, Table 7 shows the values of some recent successful approaches solving the problem. However, we should remark that the purpose of this work is not to outperform other techniques solving the standard SCP benchmark.


Instance group
d-ACO (%)n-ACO (%)ipvd-ACO (%)n-ACO (%)ipvd-ACO (%)n-ACO (%)ipv

First study
A0.430.0881.800.080.0363.830.790.4346.03
B0.530.2846.600.360.1751.890.620.3937.24
C1.861.746.541.221.173.742.491.8924.06
ALL0.940.7025.690.550.4617.201.300.9030.60

Second study
Instance group
d-ABC (%)n-ABC (%)ipvd-ABC (%)n-ABC (%)ipvd-ABC (%)n-ABC (%)ipv (%)

A0.380.1951.250.160.0756.650.660.3054.25
B0.740.1974.560.500.0982.811.110.2973.77
C10.711.0790.019.910.9090.9512.751.4888.39
ALL3.940.4887.803.520.3590.044.840.6985.73


Author/sSIA (%) (%) (%)

Farahani et al. [41]ABC0.160.01
Ren et al. [43]ACO0.200.030.39
Lu and Vasko [48]TLBO200.06
Lu and Vasko [48]TLBO100.09
Naji-Azimi et al. [49]EM-like0.20
Lu and Vasko [48]TLBO0.28

At this point, it seems that the alternative approach of the algorithms provides better performance in both cases. However, we do not know if the differences observed are significant. To this end, the statistical methodology procedure described by Lanza-Gutierrez et al. [56] was applied. First, we removed all possible outliers. Then, we analysed the normality of data, obtaining that we cannot assume normal distribution in any case. Consequently, the median should be considered as average value for calculating in equation (25).

Next, we study if there are significant differences in the solution quality of the algorithms. Starting with the first study, we consider the Wilcoxon–Mann–Whitney test with hypotheses : and : , with , where and are the average RPD of the algorithm and for a given instance, respectively. The values obtained for each instance and ACO approach are shown in Table 8 under the title RPD analysis, where values lower than the significance level are given in bold, i.e., the confidence level is 0.95. Note that the unilateral test performed between the two possibilities was the one that matches with the descriptive analysis, the other test being marked with a dash in the table. Also note that in case of equality between the average RPD values, the two unilateral tests are performed. For the second study, we consider the Wilcoxon–Mann–Whitney test with similar hypotheses as before : and : , with . The values obtained are also shown in Table 9 with the same notation as in Table 8.


RPD analysis ()Execution time analysis ()
d-ACO  n-ACOn-ACO  d-ACOd-ACO  n-ACOn-ACO  d-ACO

Inst
4_10.0010.001
4_20.0010.001
4_30.0010.001
4_40.0010.001
4_50.0030.006
4_60.0010.001
4_70.0010.001
4_80.0010.001
4_90.0010.001
4_100.5000.5000.001

5_10.0040.001
5_20.1310.001
5_30.0010.001
5_40.5000.5000.001
5_50.0010.001
5_60.5000.5000.001
5_70.0010.001
5_80.0010.001
5_90.0600.001
5_100.0010.001

6_10.0020.150
6_20.5000.5000.117
6_30.5000.5000.001
6_40.5000.5000.001
6_50.0050.054

a_10.0010.001
a_20.0010.006
a_30.5000.5000.001
a_40.0010.001
a_50.0010.001

b_10.5000.5000.291
b_20.5000.5000.001
b_30.5000.5000.565
b_40.5000.5000.106
b_50.5000.5000.001

c_10.0010.001
c_20.5000.5000.001
c_30.0010.001
c_40.5000.5000.013
c_50.0020.001

d_10.0010.002
d_20.5000.5000.124
d_30.0010.003
d_40.5000.5000.017
d_50.5000.5000.059

nre_10.5000.5000.001
nre_20.0010.001
nre_30.5000.5000.112
nre_40.0020.006
nre_50.5000.5000.001

nrf_10.5000.5000.108
nrf_20.5000.5000.001
nrf_30.5000.5000.292
nrf_40.5000.5000.001
nrf_50.5000.5000.001

nrg_10.0010.001
nrg_20.0010.261
nrg_30.0010.378
nrg_40.0380.574
nrg_50.0010.001

nrh_10.0010.001
nrh_20.5000.5000.001
nrh_30.5000.5000.001
nrh_40.2610.001
nrh_50.5000.5000.003


InstRPD analysis ()Execution time analysis ()
d-ABC  n-ABCn-ABC  d-ABCd-ABC  n-ABCn-ABC  d-ABC

4_10.0010.001
4_20.5000.5000.001
4_30.0010.001
4_40.0370.054
4_50.0010.001
4_60.0010.001
4_70.0010.001
4_80.0010.001
4_90.0010.459
4_100.0010.001

5_10.0010.001
5_20.0010.001
5_30.0010.415
5_40.5000.5000.001
5_50.5000.5000.001
5_60.5000.5000.001
5_70.0010.001
5_80.5000.5000.348
5_90.0010.001
5_100.0010.001

6_10.0010.001
6_20.5000.5000.001
6_30.5000.5000.001
6_40.5000.5000.001
6_50.5000.5000.001

a_10.0010.001
a_20.0010.479
a_30.0010.135
a_40.5000.5000.025
a_50.0010.001

b_10.0010.001
b_20.5000.5000.001
b_30.5000.5000.012
b_40.5000.5000.001
b_50.5000.5000.001

c_10.0010.001
c_20.0010.001
c_30.0010.022
c_40.0010.001
c_50.5000.5000.001

d_10.5000.5000.097
d_20.0010.001
d_30.001