Abstract
The set covering problem (SCP) is an NPcomplete optimization problem, fitting with many problems in engineering. The traditional SCP formulation does not directly address both solution unsatisfiability and set redundancy aspects. As a result, the solving methods have to control these aspects to avoid getting unfeasible and nonoptimized in cost solutions. In the last years, an alternative SCP formulation was proposed, directly covering both aspects. This alternative formulation received limited attention because managing both aspects is considered straightforward at this time. This paper questions whether there is some advantage in the alternative formulation, beyond addressing the two issues. Thus, two studies based on a metaheuristic approach are proposed to identify if there is any concept in the alternative formulation, which could be considered for enhancing a solving method considering the traditional SCP formulation. As a result, the authors conclude that there are concepts from the alternative formulation, which could be applied for guiding the search process and for designing heuristic feasibilit\y operators. Thus, such concepts could be recommended for designing stateoftheart algorithms addressing the traditional SCP formulation.
1. Introduction
The set covering problem (SCP) is a classical problem shown to be NPcomplete by Karp [1] and whose optimization version is NPhard. Although this is a traditional problem, SCP is widely considered in the current scientific literature because it fits problems in relevant areas, such as engineering, vehicle routing, medical domain, and facilities allocation (see e.g., [2–6]).
Most contributions in the SCP field considered the traditional SCP formulation introduced by Chvatal [7] and defined as follows. Let be a set of objects and let be a collection of subsets of , where each subset has a nonnegative cost associated. Then, the purpose of SCP is to get a minimum cost family of subsets , such that each element of belongs to at least one subset of the family . This traditional formulation does not directly deal with two aspects: solution unsatisfiability and set redundancy. The solution unsatisfiability aspect is related to the possibility of generating unfeasible solutions during the search. The set redundancy aspect is related to the possibility of generating nonoptimized solutions in cost, including redundant components (subsets). The noninclusion of these two aspects in the formulation means that the solving method has to address them to ensure good performance.
In the last years, Bilal et al. [8] proposed an alternative SCP formulation. Its main contribution was that both redundant sets and unfeasible solutions were directly penalized in the fitness function. Therefore, the solving method does not have to control such aspects in contrast to the traditional SCP formulation. Nevertheless, the contribution of the alternative formulation becomes questionable due to two main issues. First, Vasko et al. [9] demonstrated that the calculation effort to remove redundant components from an SCP solution is almost negligible. Thus, including a redundancy removal operator in a solving method addressing the traditional formulation does not increase excessively the computational cost. Second, there are simple methods for transforming unfeasible solutions into feasible ones, such as the one proposed by Beasley and Chu [10]. As a consequence, alternative formulation seems not to be advantageous.
Analysing the work proposed in the alternative formulation in Bilal et al. [8], they compared the alternative formulation to the traditional one. To this end, they solved the standard Beasley’s OR library [11] through two algorithms: a simple descent heuristic (DH) addressing the alternative formulation and a standard greedy heuristic (GH) addressing the traditional one. As a result, DH outperformed GH, which is valuable for justifying the alternative formulation. However, Vasko et al. [9] later applied the same GH using the traditional formulation on the same instances, but included a simple redundancy removal operator, obtaining better results than the ones shown in Bilal et al. [8] for DH. Once again, the alternative formulation seems to have questionable merit. However, this comparison initiated in Bilal et al. [8] might have some limitations:(i)The heuristic techniques considered might not be the most appropriate according to the current state of the art, in which metaheuristics, especially swarm intelligence algorithms (SIAs), provide the best results in general.(ii)The authors independently compared the two formulations using different algorithms. This focus could be correct because the algorithm best suited for a formulation could be very different, even in type, from the algorithm for the other formulation. However, there is no study combining aspects from the two formulations that could provide some advantages for the same solving method.(iii)The authors did not consider any statistical method for comparing both formulations. Instead, they only compare the average solution obtained.
On this basis, this paper questions whether there is any advantage from the concepts involved in the alternative formulation, beyond the novel problem formulation. This idea leads us to propose two studies focused on solution quality as a way to know if there is any concept in the alternative formulation, which could be considered for enhancing a method using the traditional formulation. To this end, the authors select two different metaheuristics adequate for the studies, although other metaheuristics could have been selected without loss of generality.
The only demonstration of the concept utility from the alternative formulation is valuable. This means that future solving methods could include these novel concept. This research focus implies that the authors are not focused on getting the best absolute results solving Beasley’s OR library, but they understand that the solutions obtained should be reasonable, as will be discussed later.
The first study focuses on identifying if there is any concept in the alternative formulation, which could be applied for guiding the search process of a solving method addressing the traditional formulation. To this end, the authors generate two versions of the same SIA addressing the traditional formulation. In the first version, (i), the search process of SIA is guided by concepts from the traditional formulation. In the second version, (ii), the search process of SIA is guided by concepts from the alternative formulation. Further details of this first study are as follows:(i)This study requires a solving method, whose search process is closely linked to the optimization problem. The ant colony optimization (ACO) algorithm meets this requirement, being very sensitive to the heuristic information operator designed based on the problem to solve. Thus, two heuristic information operators are considered: in (i), a usual operator based on the traditional formulation and in (ii), a novel operator based on concepts from the alternative formulation.(ii)The two ACO approaches in (i) and (ii) include the same usual operator for removing redundant sets. No operator for transforming unfeasible solutions into feasible ones is considered because ACO does not generate them.(iii)The two ACO approaches in (i) and (ii) are applied for solving Beasley’s OR library. The results obtained were analysed through a widely accepted statistical method. Both approaches were tuned through the automatic method iterated FRace, preventing errors from a manual method [12].
The second study focuses on identifying if there is any concept in the alternative formulation, which could be considered for designing the operators needed for transforming unfeasible solutions into feasible ones while removing redundant columns. Note that this type of operators is widely applied in most SIAs solving SCP. To this end, the authors generate two versions of the same SIA addressing the traditional formulation. In the first version, (iii), SIA considers the widely applied operator proposed in Beasley and Chu [10]. In the second version, (iv), SIA considers a novel operator inspired by concepts from the alternative formulation. Further details of this second study are as follows:(i)The second study requires a solving method, which could generate unfeasible solutions. The artificial bee colony (ABC) is one of the many metaheuristics meeting this requirement. Thus, two different feasibility operators are considered in (iii) and (iv).(ii)The two ABC approaches in (iii) and (iv) are applied for solving Beasley’s OR library. The results obtained were analysed through a widely accepted statistical method. In this case, the parameter configuration is taken from the literature (see [13]) because the feasibility operator is considered as an external tool.
To summarize, the motivation of this research is to identify if there is any concept from the alternative formulation, which could be used to solve the traditional SCP problem. To the best of our knowledge, this is the first work performing this research. Figure 1 summarizes the main tasks performed in the two studies. The contributions to the field are as follows:(i)A first study is proposed to identify concepts of interest in the alternative SCP formulation, which could be applied for guiding a search method addressing the traditional SCP formulation. The study results in that the gain concept in the alternative SCP formulation is useful for guiding the search, outperforming the results obtained by a usual heuristic information operator from the literature.(ii)A second study is proposed to identify concepts of interest in the alternative SCP formulation, which could be considered for designing feasibility operators. The study results in that the gain concept in the alternative SCP formulation is useful for designing this type of operators, outperforming a usual operator in the literature. This contribution is especially interesting because this type of operators is widely applied in the metaheuristic SCP field as a blackbox method.
These two contributions are especially interesting for future works, implementing techniques for solving the traditional SCP formulation. From the first study, it is shown that the gain concept in the alternative SCP formulation is useful for guiding the search. That means that this concept could be considered during the design of novel solving methods for SCP. From the second study, it is shown that the gain concept is useful for designing feasibility operators. That means that this concept could be considered to improve techniques already shown to be useful for solving the problem, as well as proposing novel feasibility operators. As stated before, the second future scope line is especially interesting because feasibility operators are widely applied in the literature as blackbox techniques outside the solving method.
The rest of this paper is structured as follows. Section 2 discusses related work. In Section 3, a formal statement of both SCP formulations is provided. In Section 4, the main aspects of the ACO algorithm in the first study are discussed. Section 5 discusses the main aspects of the ABC algorithm in the second study. In Section 6, the experimental methodology followed is discussed and the solution quality results are analysed. Finally, Section 7 concludes and introduces future works. Table 1 includes a summary of the notation considered throughout this work.
2. Related Work
The literature about SCP is extensive. Some authors considered exact algorithms, such as branchandbound and branchandcut techniques [14–16] or linear programming [17–19]. More recently, Caprara et al. [20] compared several exact algorithms, concluding that the best exact technique was CPLEX.
It is well known that exact techniques require excessive computer resources on large problems. Therefore, much effort was focused on exploring heuristic and metaheuristic algorithms, which could find nearoptimal (or even optimal) solutions for large problems in reasonable computing time.
Starting from heuristic methods, Chvatal [7] applied a classical GH. Although GHs are simple and fast to implement, they seldom produce good quality solutions. Some researchers tried to improve GHs by adding randomness (see e.g., [9, 21–24]). Highly sophisticated heuristics based on Lagrangian relaxation were also considered, yielding very good solutions (see e.g., [20, 25–27]). From this brief review, it is shown that the number of proposals considering heuristic methods is limited for SCP in the last years. Note that in other optimization problems, the proposal of heuristics is usual (e.g., [28]). This situation is opposite for metaheuristics, acquiring a great relevance during the last decades [29]. Thus, metaheuristics were applied to fields as networking [30], biological ontology alignment [31], shop scheduling [32], chemical analysis [33], and image encryption [34].
Metaheuristics combine effectiveness exploring the search space and basic heuristic methods. Such techniques are usually split into three large groups: evolutionary algorithms (EAs), trajectory algorithms (TAs), and SIAs. To solve SCP, some authors considered EAs (see e.g., [10, 35–37]). Other authors applied TAs (see e.g., [38, 39]). However, the most widely applied metaheuristics for solving SCP are SIAs. Some examples are the artificial bee colony (ABC) algorithm [13, 40, 41], the ant colony optimization (ACO) algorithm [42–44], the firefly algorithm (FA) [45, 46], the teachinglearningbased optimization (TLBO) algorithm [47, 48], the electromagnetismlike (EMlike) algorithm [49, 50], the shuffled frog leaping algorithm (SFLA) [51], the fruit fly optimization algorithm (FFOA) [52], the cuckoo search algorithm (CSA) [53, 54], the cat swarm optimization (CSO) algorithm [55, 56], the jumping particle swarm optimization (JPSO) method [57], the black hole optimization [54, 58], and the monkey search algorithm [59].
Analysing the previous contributions according to the results obtained, exact algorithms provided excellent results, solving reduced SCP problems. Focusing on larger SCP problems addressed by approximate techniques (heuristics and metaheuristics), the authors check that heuristics do not provide as good results as the more sophisticated metaheuristics. Thus, the best results were usually obtained by SIAs. In this line, we should mention the valuable contributions of NajiAzimi et al. [48, 49] and Balaji and Revathi [57] who got optimal or nearoptimal solutions for classical SCP benchmarks.
All the works listed before have in common that they considered the traditional SCP formulation. On the contrary, the alternative formulation received limited attention. As far as the authors know, there are only two works considering the alternative formulation. In Bilal et al. [60], they solved an SCP variant through an iterated tabusearch metaheuristic. In Crawford et al. [61], they compared the results obtained solving the traditional and alternative formulations through the ACO algorithm.
The research presented in this paper was inspired by a very preliminary work discussed before (see Crawford et al. [61]). In this contribution, there is no study regarding the existence of concepts in the alternative formulation, which could be considered for solving methods addressing the traditional formulation. In LanzaGutierrez et al. [56], the authors applied an SIA to solve SCP by a CSO algorithm but with a completely different approach.
3. Set Covering Problem Statements
Let and be the row and column sets, respectively. Let be a universe of elements and let be a collection of subsets of , such that and , with . Each subset has a nonnegative cost associated , where .
The optimization problem is formally defined by assuming a binary matrix of rows and columns, where the rows are the elements of the universe and the columns are the subsets. Let be the value in the cell of given byfor and , where . Thus,
The objective of SCP is to find a subset of S covering (containing) all the elements of at a minimal cost. A solution to SCP is usually expressed as a binary vector , where
Then, the cost of the solution is
Next, we give a formal statement of the two SCP formulations.
3.1. Traditional Formulation
The SCP fitness function is
Then, given elements and subsets, the objective is to find a collection of subsets tosubject to
The constraint in equation (7) ensures that each row is covered by at least one column. If this constraint is not satisfied, the solution is considered unfeasible. The constraint in equation (8) is only for the integrity of the mathematical programming. Hence, this equation does not need to be addressed as a constraint in heuristic approaches.
3.2. Alternative Formulation
In this formulation, covering an element is identified with collecting a gain at a given cost. Let be the cost of the cheapest set among the sets covering the element given bywhere provides the point/points in which a function gets its minimum value/values. Then, the gain of covering an element iswhere is a very small positive constant. Based on this gain concept, the SCP fitness function iswhere
Then, given elements and subsets, the objective is to find a collection of subsets tosubject to
The constraints in equations (14) and (15) are only for the integrity of the mathematical programming. According to this formulation, there are no unfeasible solutions as happens with the traditional formulation. Note that unfeasible solutions still exist for the problem. However, the alternative formulation penalizes such issue instead of discarding the solution. Moreover, it also penalizes directly redundant sets beyond having a higher cost as occurs for the traditional formulation. Thus, the use of redundancy removal operators is not needed, in contrast to the traditional formulation, where it is highly recommended.
4. Ant Colony Optimization
The ACO algorithm is inspired by ant colony behaviours. The ACO process is focused on the search of the optimal path in a graph based on an artificial ant colony. Thus, ants work cooperatively and communicate through heuristic information depending on the problem and pheromone trails. Pheromone trails are a type of distributed information, which is dynamically updated by the ants. Pheromones keep the experience gained during the search process while remarking promising areas of the search space.
Let be the solution generated by an ant at construction step . Let be the set of uncovered rows in . Let be the set of unselected columns in . Reviewing the scientific literature [42, 62], a usual heuristic information expression for a column at step iswhere is the number of noncovered rows in , which could be covered by column at step . This value iswhere is the cardinal of a set and denotes the row set covered by column , for and .
In this work, we propose a heuristic information inspired by the gain concept from the alternative formulation introduced in Section 3.2 aswhere is the sum of the gains of covering the noncovered rows in by column at step . Thus,where is given in equation (10).
To simplify the notation, we define the heuristic information for a column at step based on whether we consider the traditional formulation or the alternative one. That is,
Algorithm 1 shows the procedure of a general ACO. Next, the main steps are detailed.(1)Initialization: in the beginning, we propose to preprocess the SCP instances by using column domination and column inclusion [18]. Next, the algorithm parameters are initialized. Traditionally, ACO algorithms do not include an initialization step to generate the solutions in the population. Instead, pheromone trails are randomly assigned and then solutions are generated according to this random information. That means that the algorithm could need to run some iterations before having the right information about the solution component quality. At this point, we propose to include a greedy population initialization step in ACO based on Lu and Vasko [48]. This step corresponds to line 1 of Algorithm 1.(2)Solution construction method: each ant starts with an empty solution where columns are added iteratively until all rows are covered. Consequently, this strategy causes all solutions generated to be feasible. Most ACObased algorithms consider a similar state transition rule, preferring solution components with high pheromone and heuristic values (see e.g., [42, 62]. A possible way to generate solutions is the single row oriented method (SROM) proposed by Ren et al. [43]. In that work, it was demonstrated that SROM reduces the computation burden compared to other methods. Thus, SROM is used in this paper as the solution construction method. Additionally, we also consider the ant colony system (ACS) proposed by Dorigo and Gambardella [63] as an extension of the ACO algorithm. ACS includes a pseudorandomproportional rule, providing a direct way to balance between exploration and exploitation during the selection of the solution component. If denotes the column selected at step , then the ACS rule is where is a random number uniformly distributed in , is a parameter determining the relative importance of exploitation versus exploration, is the column provided by SROM at step , and provides the point/points in which a function gets its maximum value/values. Thus, if , then it returns the nonselected column having the highest value of at step , where denotes the pheromone trail of column and and denote the relative importance of pheromone trails and heuristic information, respectively. This step corresponds to line 4 of Algorithm 1.(3)Local search: it is well known that local search is effective to improve ACO performance. We consider the local search proposed by Ren et al. [43], where for each column in , the algorithm determines if the column should be removed or replaced by one or more columns while keeping solution feasibility. This step corresponds to line 5 of Algorithm 1.(4)Update pheromone trails: we consider that pheromone trails are updated based on the maxmin ant system (MMAS) approach proposed by Stützle and Hoos [64]. In this method, after each ant generates a full solution, all pheromone trails are decreased uniformly to simulate evaporation, forgetting part of the historical experience. Next, a small amount of pheromone is deposited on the columns corresponding to the best solution found. To this end, MMAS considers the best solution found in the current iteration, instead of the best solution found from the beginning of the algorithm. We opted for the second option as did Ren et al. [43]. Thus, the search can concentrate fast around the best solution found. This strategy could result in a bad performance if the algorithm is trapped in bad solution areas. However, this risk is reduced due to the ACS strategy detailed in Step 2. Formally, pheromone trails are updated followingwhere is the pheromone persistence and is the amount of pheromone put on column provided in Stützle and Hoos [64]. Additionally, they also proposed that the range of pheromone trails is in , where denotes a ratio coefficient and is the best solution found from the beginning of the algorithm, . This step corresponds to line 7 of Algorithm 1.

5. Artificial Bee Colony
The ABC algorithm is inspired by honey bee behaviours, the search process being guided by three types of artificial bees: workers, onlookers, and scouts. The general procedure of ABC is shown in Algorithm 2. It starts by generating an initial population of solutions. For every row, a random column with covering possibilities is selected until all rows are covered. Next, along iterations, the population is managed by workers and onlookers, which are randomly recruited in each iteration. The behaviour of each bee is as follows:(i)A worker takes a random solution from the population to generate a new solution by adding a random number of columns between 0 and (in percentage) of columns in the SCP instance. This step is followed by an elimination of random columns between 0 and (in percentage) of columns in the SCP instance. The fitness value of the individual generated by the worker is obtained. In the case that the fitness value of the new individual is better than the previous individual assigned to the worker, then the new individual replaces the previous one. In the opposite case, the counter is increased for the number of trials for improving the current solution. Otherwise, the counter is set to zero. If such counter reaches the threshold, the worker is transformed into a scout bee.(ii)An onlooker generates a new solution following a similar procedure as for workers, but selecting the solution with probability to its quality, instead of randomly. The concept of the threshold is not used in onlookers.(iii)A scout discards its current solution and generates a new one by following the same strategy as for generating the initial population. As expected, the counter of trials is initialized to zero.

As both workers and onlookers can generate unfeasible solutions because of the random elimination of columns, it is mandatory to manage this issue. Crawford et al. [13] proposed to consider the usual heuristic by Beasley and Chu [10] for transforming unfeasible solutions into feasible ones while reducing the cost of the solution in a later step. This heuristic is shown in Algorithm 3. Here, the first stage in lines transforms an unfeasible solution into a feasible one. The second stage in lines removes redundant columns. In this algorithm, note that , , and are the set of columns in a solution, the set of columns that row covers, and the number of columns in that cover the row , respectively.

Focusing on the first stage, the steps required to make a solution feasible include the identification of uncovered rows and the addition of columns to the solution so that all rows are covered. The search for the missing columns in the proposal of Beasley and Chu [10] is guided bywhere is the number of noncovered rows in the solution, i.e., the ratio between the cost of a column and the number of noncovered rows, which could be covered by such column.
As an alternative strategy, this paper proposes to guide the search based on the concepts from the alternative SCP formulation, that is,i.e., the ratio between the cost of a column and the sum of the gains of covering the noncovered rows by such column.
6. Experimentation
This section discusses the experimental methodology and analyses the results obtained in the first and second studies.
6.1. Experimental Methodology
We apply the two approaches in each study for ACO and ABC algorithms to solve Beasley’s OR library. This dataset is widely used to report empirical results in the current literature (see e.g. [9, 40, 48]). This library includes 65 nonunicost instances generated randomly, as detailed in Table 2. For further details about the random generation of these instances, see [18, 65]. For each instance in the library, the number of rows, the number of columns, and the cost of each column are provided. Additionaly, for each row, the number of columns that covers and also the list of columns which cover that row are also provided. For a complexity study of the search space in this benchmark, we refer readers to the work by Finger et al. [66], which considered the fitnessdistance correlation landscape metric to this end. In Table 2, “Density (%)” contains the percentage of in the matrix in equation (2). “Optimal solution” shows two possible values, known and unknown, according to whether the instances have a solution tested to be optimal, or instead it could not be checked because of problem complexity. Thus, we only know the best historical solutions found for the sets nrg and nrh.
We combine two stop condition criteria for performing the experimentation: reaching a given number of fitness evaluations or getting the optimal solution. If at least a condition holds, the algorithm ends. For ACO, we assume 10,000 fitness evaluations as a stop condition. As we will discuss later, this value is enough for performing the experimentation. For ABC, we consider 500 iterations based on Crawford et al. [13].
Before running the experimentation, we should configure both algorithms. In the case of ABC, we can assume the parameters provided in Crawford et al. [13] for the two approaches of ABC considered here because (i) the authors also solved SCP and (ii) the approach based on the alternative formulation only modifies the heuristic operator for solution feasibility and the operators guiding the search are not modified. In the case of ACO, we should configure the two approaches of ACO considered here because (i) we do not have any set of parameters from previous works for the approach of ACO used and (ii) the approach based on the alternative formulation modifies how the search is performed in comparison to the traditional one, and then we should configure the two approaches independently.
Thus, for the first study, we consider , , , , and of Crawford et al. [13]. For the second study, we get the parameters of the two ACO approaches using FRace. This method configures a metaheuristic starting with a set of candidate values for each parameter. Then, it discards bad performance configurations as soon as statistically sufficient evidence is reached against them, focusing on the most promising ones.
Concretely, we consider the iterated FRace implementation for R software by LópezIbáñez et al. [67]. Following the authors’ recommendations, we divided the benchmark into three groups according to the problem size () to get a consistent configuration. Thus, Group A includes instance sets 4, 5, and 6; Group B includes instance sets a, b, c, and d; and, finally, Group C includes instance sets nre, nrf, nrg, and nrh. Table 3 shows the candidate values for each parameter based on previous works [43] and the configurations obtained for each group and ACO approach. Note that dACO denotes ACO with the traditional heuristic information expression and nACO denotes ACO with the alternative heuristic information expression.
Once both ACO approaches are configured, 30 independent runs are performed for each instance and algorithm. Next, we analyse if there are significant differences between the behaviour of the two algorithms regarding solution quality and execution time for each instance. To this end, the authors consider the Wilcoxon–Mann–Whitney test [68] to validate several hypotheses. The implementation of this test is the one provided in the assessment performance tool described in Knowles et al. [69] and available in Fonseca et al. [70].
However, indirectly the RPD metrics used for the assessment evaluation consider the optimal solution for the instances, which can be considered as the solutions provided by the corresponding exact techniques. Thus, the RPD metric evaluates how far the solution found by the metaheuristic is from the optimal solution provided by an exact technique.
As a solution quality metric, we consider the relative percentage deviation (RPD), which evaluates how far the solution found by the metaheuristic is from the optimal solution known in the literature. The lower the RPD value, the better the solution obtained. Thus, indirectly, this metric evaluates the performance of the technique in comparison with the corresponding generic exact technique solving the same problem instance. Three RPD metrics are included: the average RPD, ; the minimum RPD, ; and the maximum RPD, . They are calculated aswhere , , and denote the average solution cost, the minimum solution cost, the maximum solution cost from a distribution of 30 samples solving a instance, respectively, and is the optimum solution cost of the instance. Note that the cost of the best solution found during one run is given by equation (4). Thus, although both SCP fitness formulations are different, we can compare the results obtained without loss of generality.
Regarding the computing platform considered to perform the experimentation, the authors used two computing nodes in a computing cluster. Each node has two 2.33 GHz Intel Xeon E5410 with four cores each and a 1600 MHz DDR3 16 GB RAM, running a Linux operating system. All executions were performed in a single core without parallelism because the goal of this paper is not to explore parallelism. The reason for considering such unconventional infrastructure is the possibility of performing many independent executions because of the needs for the statistical test required to validate the proposal. That means that for a single execution or a reduced set of them, a conventional computer could be considered. To avoid the operating system tasks affecting the total computing time obtained during the experimentation, one core in each computing node was idle. Additionally, the authors also checked that the RAM in the computing node was enough to not apply memory swap. As expected, the computing power capacity of the processor definitively affects the time required to find the solution to the problem, most of the operations being related to CPU computing and accessing the principal memory (RAM). Note the same computing nodes are considered for all the experiments in this work to not bias the conclusions reached regarding computing time.
Regarding programming languages, ACO algorithm was fully implemented in Java for Java Development Kit (JDK) 1.7. ABC algorithm was fully implemented in C. The scripts for managing the executions and collecting the results were implemented in bash. Note that the usage of two different programming languages for implementing ABC and ACO does not affect the conclusions reached in computing time. This fact is because ABC and ACO computing times are not compared in this work.
6.2. Analysis of the Experimental Results
Tables 4 and 5 show for each instance and case study, the RPD metrics (, , ), average execution time reaching the stop condition (), and average fitness evaluations needed for reaching the best solution found during the exploration (). In both tables, lower and values are given in bold for each instance. In Table 5, dABC denotes ABC with the default heuristic feasibility operator and nABC denotes ABC with the heuristic feasibility operator based on the alternative SCP formulation.
Analysing both tables regarding RPD metrics, we check that (i) nACO seems to outperform or match dACO in most instances and (ii) nABC seems to outperform or match dABC in most instances. Focusing on computing times, we reach a similar behaviour, where nACO appears to need a shorter time than dACO, except for b, c, and nrh instances, and nABC appears to need a shorter time than dABC in general. Focusing on evaluations, the field is related to the number of iterations reached as follows. In ACO, evaluations are performed for each iteration. In ABC, a number of evaluations varying between and two times are performed for each iteration. Thus, for ABC, the maximum number of evaluations will be a value in the range . For ACO, the maximum number of evaluations will be 10,000 as defined before. Analysing the field, the authors reach that the number of evaluations needed is distant from the stop condition defined, and then the stop condition is adequate in both studies.
Table 6 shows the average RPD metrics for each instance group, where “ipv” field denotes the percentage of improvement by considering the alternative approach of the algorithm instead of the default version. Analysing this table, it is observed that (i) nACO provides better RPD values than dACO for all the groups and (ii) nABC also provides better RPD values than its default version. The RPD metrics obtained are in line with other works from the literature, with RPD values lower than 1.0%. In this regard, Table 7 shows the values of some recent successful approaches solving the problem. However, we should remark that the purpose of this work is not to outperform other techniques solving the standard SCP benchmark.
At this point, it seems that the alternative approach of the algorithms provides better performance in both cases. However, we do not know if the differences observed are significant. To this end, the statistical methodology procedure described by LanzaGutierrez et al. [56] was applied. First, we removed all possible outliers. Then, we analysed the normality of data, obtaining that we cannot assume normal distribution in any case. Consequently, the median should be considered as average value for calculating in equation (25).
Next, we study if there are significant differences in the solution quality of the algorithms. Starting with the first study, we consider the Wilcoxon–Mann–Whitney test with hypotheses : and : , with , where and are the average RPD of the algorithm and for a given instance, respectively. The values obtained for each instance and ACO approach are shown in Table 8 under the title RPD analysis, where values lower than the significance level are given in bold, i.e., the confidence level is 0.95. Note that the unilateral test performed between the two possibilities was the one that matches with the descriptive analysis, the other test being marked with a dash in the table. Also note that in case of equality between the average RPD values, the two unilateral tests are performed. For the second study, we consider the Wilcoxon–Mann–Whitney test with similar hypotheses as before : and : , with . The values obtained are also shown in Table 9 with the same notation as in Table 8.
Through the Wilcoxon–Mann–Whitney test, the differences observed regarding execution time for both algorithms are also analysed. For the first study, we define the hypotheses : and : , with , where and are the average execution time of the algorithm and for a given instance, respectively. The values obtained for each instance and algorithm are shown in Table 8 under the title Execution time analysis, where values are given in bold with the same criterion as before. For the second study, we consider the Wilcoxon–Mann–Whitney test with similar hypotheses as before : and : , with . The values obtained are also shown in Table 9.
Based on the previous statistical analysis, Table 10 shows the percentage of cases where an algorithm provides the best significant performance compared to another for each study in terms of RPD and execution time. Focusing on the first study and RPD values, we verify that nACO provides better behaviour than dACO in of cases. However, it is important to remark that dACO never provides better results than nACO, meaning that nACO clearly outperforms dACO. For execution time, nACO needs lower execution times than dACO in of cases and dACO needs lower execution times than nACO in of cases. This fact could mean that the alternative heuristic information needs higher execution times under certain conditions. However, most cases in which dACO needs lower execution times correspond with instances whose optimal solution is not reached, and then the evaluations are performed for each algorithm, e.g., for instances , , , , , , , , , , and . In such unfavourable cases, the differences observed are not of concern as shown in Figure 2(a). On the other hand, most cases in which nACO needs lower execution times correspond with instances whose optimal solution is reached. In such cases, a greater difference is observed favoring nACO. This fact is because nACO reaches the stop condition before dACO, e.g., for instance sets 4, 5, 6, d, nre, and nrg. Focusing on the second study, we verify that nABC provides better behaviour than dABC in of cases, where dABC outperforms nABC in 7.27% of cases. This unfavourable situation occurs in small instances, where the search space is reduced. For execution time, nABC needs lower execution times than dABC in of cases and dABC needs lower execution times than nABC in of cases. As before, this unfavourable situation mainly occurs when nABC does not reach the optimal solution, penalizing the additional computation of the gain concept in the alternative SCP formulation. This behaviour is shown in Figure 2(b).
(a)
(b)
The previous analysis is completed with the landscape study in Tables 11 and 12 for the solutions obtained solving the instances through ACO and ABC, respectively. The metrics in such tables quantify solution quality (QMetric ), the rate of success (SRate ), and speed of reaching a solution (SSpeed ). QMetric follows an exponential formulation which allows distinguishing between the performance of two algorithms, which obtained solutions close to the optimum fitness. SRate is defined as the number of successful runs that the algorithm reaches the optimum fitness divided by the total number of runs. SSpeed quantifies the number of evaluations taken to reach the optimum fitness. For the three metrics, the value 1 indicates the highest quality. More details about the three metrics, as well as formulation, are listed in [71]. Analysing Tables 11 and 12, we check that, in general, both nACO and nABC provide a higher or equal QMetric than dACO and dABC approaches. For SSpeed, we check that both dACO and dABC need a lower or equal number of evaluations than nACO and dABC to reach the optimum fitness. For SRate, we check that both dACO and dABC reach the optimum fitness a greater or equal number of times than nACO and dABC. That means that nACO and nABC obtained better quality solutions, are better in convergence, and provide a more robust performance than the default approaches.
Up to this point, we know that the concepts included in the alternative formulation positively affect the search process in the first study, where the concepts from the alternative formulation are considered for guiding purposes. A mapping of the solutions visited by nACO and dACO could help to effectively show how the two approaches explore the search space. To this end, we consider the mapping method (MaM) proposed by Autuori et al. [72], where a mapping function converts a multidimensional space solution in one dimensional space through two steps: (i) a binary conversion is applied to the solution (for SCP, the binary encoding is straightforward, where a column takes the value 1 if it is considered in the solution and 0 otherwise) and (ii) the binary Hamming distance is calculated between such binary solution and a reference solution randomly generated, resulting in a new representation of the solution. This representation is used for identifying the different zones explored by the algorithms. Note that the number of zones corresponds with the number of elements in the binary encoding (the number of columns). To this end, all the binary representations are added, resulting in a frequency diagram, showing how usually the algorithm includes a column in a solution.
Table 13 shows three metrics analysing the frequency diagrams previously generated for each ACO approach and an instance from each group 4, 5, 6, a, b, c, d, nre, nrf, nrg, and nrh. Note that the frequency diagrams were generated using all the solutions built in the 30 runs for each algorithm. The metrics used were also proposed by Autuori et al. [72] and are (i) the number of unexplored zones (“”), the number of explored zones (“”), and the number of large explored zones (“”). The metrics are related as follows. Convergence is considered high if is few in number. Diversity is considered good if the number of is large. Analysing Table 13, we check that is usually higher for nACO than for dACO, meaning that dACO improves diversity during the search compared to the traditional approach. This fact is especially relevant for large instances, as occurs for nrg_1 and nrh_1, where metric is significantly large. Focusing on , we check that the differences observed between both approaches are not as pronounced as for . However, such differences could mean that dACO has a lack of convergence during the search, and then future authors should manage this fact.
From these two studies solving the traditional SCP, we verify that (i) the concepts from the alternative formulation are useful in guiding the search process of a metaheuristic and (ii) the concepts from the alternative formulation are useful in updating a usual heuristic feasibility operator from the literature. The improvement in both studies was observed in terms of the solution quality, the rate of success, and the speed of reaching a solution. The conclusion in (ii) is especially interesting because this type of operators is generically applied in metaheuristics (generating unfeasible solutions), and the proposal could be directly incorporated into many solving methods.
7. Conclusions and Future Scope
Traditionally, SCP is formulated without addressing two issues: solution unsatisfiability and set redundancy, meaning that the solving method has to implement mechanisms to control such aspects. In recent years, an alternative SCP formulation was proposed, whose main contribution was that both issues were directly addressed by including penalties in the fitness function.
Reviewing the current scientific literature, we check that the alternative SCP formulation has received limited attention. Hence, we question whether there is any advantage of using this formulation beyond addressing set redundancy and feasibility aspects. This idea led us to propose two studies based on a metaheuristic approach. The aim is to identify if there is any concept in the alternative formulation, which could be considered for enhancing a solving method using the traditional formulation. The first study considers an ACO algorithm in two contexts: (i) solving the problem by addressing the traditional SCP formulation and (ii) solving SCP addressing the traditional formulation but using concepts from the alternative one for guiding the search. The second study considers an ABC algorithm in two contexts: (i) solving SCP addressing the traditional formulation and (ii) solving SCP addressing the traditional formulation but including concepts from the alternative one for updating a usual heuristic feasibility operator from the literature.
As a result of the first study, the authors conclude that it is possible to consider the gain concept from the alternative SCP formulation to successfully guide ACO search addressing the traditional SCP formulation. The benefits of the novel guide are shown in terms of solution quality, convergence, execution time, and diversity. From the second study, the authors conclude that it is possible to consider the gain concept from the alternative SCP formulation to update a feasibility operator from the literature. The benefits of the novel feasibility operator are shown in terms of solution quality, execution time, and convergence.
The first conclusion is interesting for designing novel guide strategies for solving the traditional SCP formulation. The second conclusion is especially interesting because feasibility operators are widely considered in metaheuristic approaches solving the SCP because of the usual generation of unfeasible solutions. This type of operators is integrated into the solving method as blackbox methods. That means that it is straightforward to interchange one method for another. This situation implies that the feasibility operator based on the alternative SCP formulation presented here could be integrated with a reduced effort in already published works from the literature, as well as in future works to evaluate each specific use case.
As future lines of research, it would be interesting to consider additional metaheuristics to this study, as well as a larger dataset with bigger problems. Additionally, it could be interesting to extend this work by taking into account the performance of the solving methods and the search space complexity of the instances based on the landscape metrics.
Data Availability
The results shown in this paper were obtained by solving some freely available datasets in the literature. They can be found in http://people.brunel.ac.uk/∼mastjjb/jeb/orlib/scpinfo.html.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
The authors would like to thank the following grants: Juan A. GomezPulido is supported by grant IB16002 (Junta Extremadura, Spain), Broderick Crawford is supported by grant CONICYT/FONDECYT/REGULAR/1171243, and Ricardo Soto is supported by grant CONICYT/FONDECYT/REGULAR/1190129. The authors also thank “Proyecto CORFO 14ENI226905 Nueva Ingeniería para el 2030.”