Abstract

In this paper, complexity curtailing techniques are introduced to create faster version of insertion heuristics, that is, cheapest insertion heuristic (CIH) and largest insertion heuristic (LIH), effectively reducing their complexities from to with no significant effect on quality of solution. This paper also examines relatively not very known heuristic concept of max difference and shows that it can be culminated into a full-fledged max difference insertion heuristic (MDIH) by defining its missing steps. Further to this the paper extends the complexity curtailing techniques to MDIH to create its faster version. The resultant heuristic, that is, fast max difference insertion heuristic (FMDIH), outperforms the “farthest insertion” heuristic (FIH) across a wide spectrum of popular datasets with statistical significance, even though both the heuristics have the same worst case complexity of . It should be noted that FIH is considered best among lowest order complexity heuristics. The complexity curtailing techniques presented here open up the new area of research for their possible extension to other heuristics.

1. Introduction

The Traveling Salesman Problem (TSP) is one of the most studied problems in the scientific literature and sometimes is referred to as a mother of all combinatorial optimization problems. It continues to be a testing ground for the development of combinatorial optimization methods, while having numerous practical applications in diverse areas, including logistics, genetics, manufacturing, telecommunications, and neuroscience [1]. Examples of real-world application domains with problems that can be naturally formulated as the TSP include VLSI design, vehicle routing, data clustering, and job-shop scheduling [2]. The earliest reference to the TSP can be found in the 1832 German handbook for travelling salesmen [1]. The problem consists of finding the shortest possible tour of a set of cities, such that the tour starts from and ends at the same city and visits each of the remaining cities precisely once. The problem is simple to state but has proved to be intractable and is included among the seven “millennium prize problems” described by the Clay Mathematical Institute, carrying a prize of one million US dollars for discovery of a polynomial time solution method.

Meanwhile, research and practice in the TSP have focused on heuristic methods that yield fast approximate solutions. These heuristic methods tend to cluster into three main groups: (i) tour construction heuristics, (ii) tour improvement heuristics, and (iii) composite heuristics (which combine elements of both tour construction and tour improvement). A tour construction heuristic constructs the tour from scratch, beginning with one city and iteratively expanding the subtour by one city at a time. In contrast, a tour improvement heuristic begins with a complete tour and makes one or more rearrangements in an attempt to improve it. A vast space of “composite” heuristics obscures the distinction between these two categories by using elements of both; for example, in the highly successful Lin-Kernighan heuristic [3], the context is that of iterated tour improvement; however the improvement process consists of repeated construction of full solutions from partial solutions. The concord software code [4], which has solved most problems in TSPLIB to optimality, uses the Lin Kernighan heuristic to find out near optimal tours and then applies various mathematical programming methodologies to achieve optimality.

With no general polynomial time solution method yet available, the quest for the development of new heuristics for the TSP remains active. Tour construction heuristics have played an important role in this quest, in particular because they can form components of a wide range of approaches. For example, they can be used to construct the starting solutions for tour improvement heuristics [5], and they can provide rough estimates of the cost of optimal solutions. In turn, they therefore provide interesting analytical grounds for the study of upper bounds on solution quality [6] and provide material for empirical studies [7, 8] that attempt to understand how optimal solutions differ from solutions that arise from tour construction heuristics. Such studies lead to better understanding of the structure of optimal solutions of the TSP, a problem that has now remained centre of our intellectual curiosity for centuries.

The basic procedure of tour construction heuristics can be summarised as follows:(1)Establishment of the initial small subtour (subtour establishment rule).(2)Selection of a city not present in the current subtour (selection rule).(3)Expansion of the current subtour to include the selected city (expansion rule).(4)Iterative application of steps and until a complete tour is obtained.In the repeated application of steps and , the subtour is “successively augmented” with the insertion of a new city; this is why tour construction heuristics are sometimes also called “successive augmentation” heuristics (see, e.g., [9]). Different tour construction heuristics are characterised by the specific methods they choose for each of the three steps: the initial subtour establishment, selection, and expansion rules. Expansion rules can be grouped into two main types: insertion and addition. An insertion-based expansion rule chooses where in the permutation to place the new city on the basis of the cost of the resulting subtour, whereas an addition-based expansion rule bases this decision on next-hop distance. It has been reported that insertion heuristics generally perform better than addition heuristics (see, e.g., [8]); we have therefore chosen to focus on insertion-based expansion heuristics in this paper.

Three insertion heuristics are particularly prominent in the literature, namely, nearest, farthest, and cheapest [10]. We consider these three heuristics in the remainder of this work, along with a further two that are less well known. The first of these two is the “largest-cost insertion heuristic” (LIH) [11], which sits naturally alongside the aforementioned three, despite being rarely considered in the literature. The second is the “max difference insertion heuristic” (MDIH). Tunnel and Heath [12] presented “max difference” as a concept that could be attached to any other heuristic; however we argue that, with appropriate extensions that we later describe (resulting in MDIH), it is more appropriately seen as a tour construction heuristic in itself, and in fact we demonstrate that it is a particularly effective one.

The insertion heuristics under study can be divided into two groups, that is, distance based insertion heuristics (DBIH) and cost based insertion heuristics (CBIH). The DBIH has city selection rule based on distance. This group includes nearest (NIH) and farthest insertion heuristics (FIH). The CBIH has city selection rule based on the cost of insertion (see (5)). This group includes cheapest (CIH), largest (LIH), and max difference insertion heuristics (MDIH). The difference in selection rule brings about difference in worst case complexity of heuristics with DBIH having worst case complexity of and CBIH having complexity of (see Section 4). This paper proposes techniques to curtail the complexity of CBIH to , effectively creating faster versions of CBIH, that is, FCIH, FLIH, and FMDIH, while addition of “F” denotes the word fast.

Previously it has been reported that farthest insertion heuristic (FIH) generally performs best among the community of insertion heuristics of or lower time complexity (see, e.g., [13]). However we will show that the performance of FMDIH is consistently better than FIH on a wide spectrum of popular datasets even though the complexity of FMDIH is no greater than that of FIH.

The remainder of this paper is structured as follows. In Section 2, we describe four of the five insertion heuristics of interest: cheapest, largest, nearest, and farthest insertion. In Section 3 we describe the previously overlooked “max difference” concept and build on that to describe the “max difference insertion” heuristic. In Section 4, we discuss design and complexity issues for all five of the heuristics of interest, and in Section 5 we then describe our complexity curtailing techniques to produce accelerated and approximated variants of cost based insertion heuristics. Section 6 provides summaries of empirical results, and we conclude with a statement of our main findings in Section 7. In Appendix, we then show the empirical results for the farthest and max difference insertion heuristics in finer detail.

2. Description of Tour Construction Heuristics

Below, we explain the design details of four of the five main heuristics considered in this paper. Our elaboration of the details follows the four-part structure given in Section 1 for tour construction heuristics. Being insertion heuristics, the most important elements are the city selection and subtour expansion methods. The city selection method is also salient for another reason, which is the fact that the name of a tour construction heuristic (e.g., cheapest and farthest) tends to reflect the nature of this rule. We come back to this fact in Section 3, when we speculate about why the “max difference” heuristic has been largely overlooked. The simple “initial subtour establishment” rule is also described for completeness and to facilitate replication of our experiments. Meanwhile, it is worth noting an interesting aspect of the city selection rule.

2.1. Initial Subtour Establishment

This is the first step in any tour construction heuristic. In each of the standard forms of the cheapest, largest, nearest, and farthest insertion heuristics, this initial “subtour” is simply a single city chosen uniformly at random.

2.2. City Selection

Momentarily referring to the subtour expansion rule (in Section 2.3), we note that two of the heuristics of interest are based on cost, and two are based on distance. Where subtour expansion is based on cost, the city selection rule is as follows. Let be the set of cities in the subtour and let be the set of cities not in the subtour, while city and is the expansion cost of subtour by insertion of city . Then, the city is selected such thatfor CIHand for LIHWhen subtour expansion is instead based on distance, the city selection rule is different. Now, let be the distance between cities and ; the city is selected such that for NIHand for FIH

2.3. Subtour Expansion

As indicated in the introduction, subtour expansion heuristics can be based around either insertion or addition. We are only concerned with insertion-based variants in this paper and so will confine our description accordingly. Let and , such that and are a pair of consecutive cities in . Let be the distance between the cities and . The subtour is expanded by inserting the selected city in the subtour such that This step is applicable to each of CIH, LIH, FIH, and NIH. From this point onwards, we refer to the edge related to as the expansion cost edge. It should be noted that the “largest-cost insertion” heuristic is not published yet, however, an unpublished paper about it is present on the internet [10]. We include it in our experiments, noting that it is the logical counterpoint to CIH, in the same way that FIH is related to NIH. It therefore completes a set of four heuristics of which two are based on minimum and maximum distance ((3)-(4)) while two are based on minimum and maximum expansion cost ((1)-(2)). We later find that LIH outperforms both CIH and NIH, though apparently missing from the literature.

3. The Max Difference Insertion Heuristic (MDIH)

The idea behind the MDIH was introduced in a Master’s thesis [12] a quarter of a century ago and no research paper about this could be found in literature. The “max difference” concept was presented by Tunnel and Heath as a way to engineer new variants of existing heuristics (they applied it to two versions of CIH and also to Stewart’s Algorithm [14]). However, since (as we will see) the concept relates directly to how the city selection step is done, from which the names of tour construction heuristics seem invariably to be derived, we would argue that it merits reinvention as a fully fledged tour construction heuristic. Its performance in the form of MDIH, as detailed in the next section, certainly supports this view. Meanwhile we speculate that Tunnel and Heath may not have put it forward as a standalone new heuristic, based on the belief that it was the “insertion” (subtour expansion) rather than the “max difference” (city selection) that was salient in such categorization; in turn, this seems to have led to its being unnoticed, despite its comparatively strong performance among similar heuristics.

We now explain the key “city selection” step for MDIH. First, we need some helper definitions. The first of these is the function “th minimum,” denoted as . The meaning of “th minimum” is that if all the values in set are sorted from minimum to maximum then this function represents the th value in that list. Now, the “th expansion cost” of a subtour by the insertion of any city not in that subtour is equal to the function , if the set is the set of all the expansion costs of that subtour on all of its edges caused by the insertion of that city. Now let and , such that and are a pair of consecutive cities in . Let be the distance between the cities and . Let represent th expansion cost of the tour by the insertion of city and then by the definitionIt should be noted that, apart from here, the exclusive use of above function is made in Sections 5.2 and 5.3. From this point onwards the edge related to is referred to as the th expansion cost edge. In the city selection rule of MDIH the city is selected such that the cost difference between its expansion cost and 2nd expansion cost is maximum among all the cities not in the tour; that is,It should be noted that there is no difference between and . The 3rd step (subtour expansion) follows the same rule as represented by (5).

Finally to present MDIH as complete tour construction heuristic in itself, we also explore five variants of the subtour establishment rule (step ). Unlike the other insertion heuristics considered in this paper, the city selection step of MDIH requires an initial subtour containing at least three cities. We therefore test five subtours establishment rules: (i) a subtour of three randomly chosen cities; (ii) a subtour first formed by two randomly chosen cities and then the third city is chosen based on cheapest expansion; (iii) a subtour first formed by two randomly chosen cities and then the third city is chosen based on largest-cost expansion; (iv) a subtour initially formed by one randomly chosen city and the next two are chosen iteratively based on cheapest expansion; (v) a subtour first formed by one randomly chosen city and then the next two are iteratively chosen on the basis of largest-cost expansion. Associated experiments and results are discussed later in Section 6.

4. Design and Complexity Aspects of Tour Construction Heuristics

Design of any heuristic consists of the design of data structures and algorithms for its efficient implementation. Our implementation centres around two key data structures: a doubly circular linked list representing the set of cities not in the subtour and a singly circular linked list representing the subtour. For set of cities not in the subtour, the doubly circular linked list allows for a very simple and fast city deletion operation. For the set of cities representing the subtour , single circular linked list is sufficient since only an insertion operation is needed here. Initially, a doubly circular linked list is prepared representing the set of cities not in the subtour. From this list, cities are deleted one by one and correspondingly inserted in the single circular linked list representing the subtour of cities at each step. The procedure continues until the doubly circular linked list becomes empty.

The design of step (city selection) of DBIH, that is, the nearest and farthest insertion heuristics ((3)-(4)), consists of finding the nearest neighbour city present in the subtour for each of the cities not in the subtour. To reduce computation costs, the nearest neighbour city in the subtour for each city not in the subtour is recorded in each iteration, and in the next iteration the distance of only the recently added city is compared with the distance of the nearest neighbour from the previous iteration, thus updating the nearest neighbour information for the current iteration. Therefore, FIH and NIH can be implemented within time complexity of .

In the case of CBIH, that is, the cheapest and largest insertion heuristics, the design of the city selection step ((1) and (2)) consists of finding the expansion cost for insertion into the subtour of each city not in the subtour. This implies that the expansion cost needs to be computed for each city not in the subtour; in turn, computation of the expansion requires visits to each and every edge of the subtour. To make this efficient, information should be preserved between iterations, and in each new iteration the expansion cost of only newly added edges is computed and compared with the expansion costs of the previous iteration to update the expansion cost information. However this shortcut may not be possible for all the cities not in the subtour. This is because some of the cities may have expansion costs in the previous iteration at an edge which is broken in the current iteration. This means the information about the expansion cost of those cities is no longer valid since the relevant edge for the expansion does not exist. For such cities, we need to compute the expansion costs for newly formed edges and if this expansion cost is less than or equal to their expansion cost in previous iterations, then it means their new expansion cost has been found on the newly formed edges. However, if not, we are forced to recompute the expansion cost of these cities on all edges of the current subtour to find out their new expansion cost. Therefore worst case scenario complexity of CIH and LIH is . The average complexity of CIH is reported as of the order [14].

To design the city selection step for MDIH, for each city not in the subtour both the expansion cost and the 2nd expansion cost need to be computed (7). However, both of these quantities can be computed simultaneously in a single scan of the subtour. To implement the step efficiently, these quantities should be preserved between iterations for each city not in the subtour, and in the new iteration only the expansion cost and 2nd expansion cost of the recently added edges are computed and compared with those of the previous iteration to update them. However, just as was the case for CIH and LIH, this shortcut is not always possible; for some cities the expansion costs at all edges of the subtour need to be computed. Therefore the worst case scenario complexity of MDIH is same as that of cheapest and largest insertion heuristics, that is, ).

5. Complexity Curtailing Techniques for Cost Based Insertion Heuristics

In this section, complexity curtailing techniques are introduced to create fast and approximate variants of CIH, LIH, and MDIH.

5.1. Design of Fast Cheapest Insertion Heuristic (FCIH)

We can devise a new but faster heuristic based on CIH on the basis of simple geometrical principle. Consider the subtour “” in Figure 1.

In Figure 1 it can be seen that are the cities yet to be inserted. For each of these cities, the cheapest insertion edge is . The final shape of the tour depends on the order in which these three edges are inserted. If city is inserted first, then edge will be deleted and the edges and will be added. In the new subtour it is now a genuine possibility that the cheapest insertion for city becomes edge , since edge ab is no longer present. In this case, we will need to traverse the entire current subtour to determine the new cheapest insertion for . However, if city had been inserted first instead of city then edge is deleted and new edges and are added. In this case most likely situation becomes that is inserted in the newly added edge , requiring only inspection of the newly added edges and .

In interim summary, if we encounter the scenario in Figure 1 during the construction of a tour using CIH, then, for the sake of computational efficiency, we would hope that city is not chosen as the first of the three to be inserted. However, some reflection will reveal that the position of city in this scenario is unlikely in the first place. It is well known that CIH grows tours “outwardly.” This means that the cities not in the subtour lie outside the periphery of the subtour, rather than inside it, such as we see with city in Figure 1. By exploiting this property, we can propose a variant of CIH that only inspects the recently added edges in each iteration, even for the cities whose earlier cheapest edge is broken. Though no longer guaranteeing the “cheapest” insertion, intuition suggests that this would lead to a favourable tradeoff, with at most minor loss of quality set against significant benefit in speed. In this “fast CIH” heuristic, the total number of computations of expansion costs (now independent of details of the dataset) is given below:Meanwhile, since one expansion cost involves traversal of 3 edges, the total number of edges traversed for computation of expansion costs is given byThe time complexity of “fast CIH” is therefore , which is below the average complexity of CIH. Note that fast CIH runs in the order on any dataset (with best case, worst case, and average case scenarios all equivalent). A full procedural summary of this new “fast cheapest insertion” heuristic (FCIH) is given in Procedure 1.

Procedure FCIH starts
Form circular linked list of cities not in the subtour
Form empty Linked list
Choose an arbitrary city
Delete from the list
Insert in the empty list
Initialize expansion costs of subtour by insertion w.r.t. all in to very high value
while is not empty do begin
     do begin (Visit all in )
         
          = Cost of insertion of city on 1st newly formed edge
          = Cost of insertion of city on 2nd newly formed edge
          =
         If   expansion cost edge is broken in last iteration
         then  
         else    
         end if
     while all in not visited
     Choose city in such that
     Delete from the list
     Insert in on the edge connected to
end while
Procedure FCIH ends

The first five lines of “Procedure 1” establish the initial tour. Expansion of this initial tour is then implemented in the remainder of the procedure. The inner loop in the remainder is used to compute the expansion cost for each city (the definition of this cost is given in (5)). However, a short cut is used to compute this cost, by computing the expansion cost on only two newly formed edges. There are two potential scenarios associated with this shortcut: in the first scenario, the edge related to the expansion cost of this city in the previous iteration is broken, and in the second scenario it has survived. In the first scenario, the new expansion cost of the city in the current iteration is assumed to be the minimum of the two expansion costs of city computed on the two newly formed edges; in the second scenario, the difference is that we also consider the expansion costs of city computed in the previous iteration. The condition “ all in not visited” ensures that the cost of all uninserted cities is computed. The outer loop chooses the city for insertion that has the lowest expansion cost.

5.2. Design of Fast Largest Insertion Heuristic (FLIH)

In contrast to CIH, tours constructed by LIH do not grow “outwardly,” and the situation in Figure 1, where one of the currently unvisited cities lies inside the periphery of the current subtour, is a common occurrence. We cannot therefore design a faster version of LIH on the same basis used for FCIH. However, in the case of LIH a different approximate assumption can be made which may help avoid computing expansion costs on all edges of the current subtour. Referring again to Figure 1: if city is inserted in the subtour first, the new subtour is formed, breaking the edge as a result. Now in this case, as discussed earlier, city might not make its cheapest insertion with newly formed edges and . However, there is a genuine possibility that city makes its 2nd cheapest insertion with or . This leads us to the idea of a fast version of LIH in which we approximate, rather than guarantee, the cheapest insertion, while scanning at most four edges in each iteration of the construction process. In detail, in each iteration we only need to keep a record of the 2nd expansion cost in addition to the (1st) expansion cost for each city not in the subtour. In turn, we find the 2nd expansion cost of a city by scanning at most only four edges: the two newly formed edges, along with the expansion cost edge and 2nd expansion cost edge of the previous iteration. If either of the latter two were broken in the previous iteration, then we can simply use the remaining three edges to (approximately) update the expansion cost and 2nd expansion cost in the new iteration. Again, the resulting heuristic, FLIH, has complexity , independently of details of the dataset. A procedural summary of fast largest insertion heuristic (FLIH) is given in Procedure 2.

Procedure FLIH starts
Form circular linked list of cities not in the subtour
Form empty Linked list
Choose an arbitrary city
Delete from the list
Insert in the empty list
Initialize expansion costs and 2nd expansion costs w.r.t. all in to very high value
while is not empty do begin
     do begin (Visit all in )
         Let
          = Cost of insertion of city on 1st newly formed edge
          = Cost of insertion of city on 2nd newly formed edge
          =
          =
         If    expansion cost edge is broken in last iteration
         then   
         else if   2nd expansion cost edge is broken in last iteration
         then   
         else      
         end if
     while all in not visited
     Choose city in such that
     Delete from the list
     Insert in on the edge connected to
end while
Procedure FLIH ends

In common with Procedure 1, the first five lines of “Procedure 2” establish the initial subtour, while the remainder concern its expansion. In the inner loop, the expansion cost and second expansion cost are calculated for each city (see (5) and (6), resp.). The costs are calculated on the basis of two newly formed edges, and the detail of the procedure concerns three potential scenarios that may arise. In the first scenario, the edge related to the expansion cost of city in the previous iteration is broken. In the second, the edge related to the second expansion cost of city is broken; in the third scenario, none of the expansion edges is broken. In the first scenario the new expansion cost and second expansion cost of city are taken to be the two minimum expansion costs of three quantities: the two expansion costs of city computed on the two newly formed edges and its second expansion cost in the last iteration. In the second scenario, the three quantities are now the two expansions costs computed on the two newly formed edges (as before) and its expansion cost computed in the previous iteration. Finally, in the third scenario, the two costs are taken as the smallest two of four quantities: the two expansion costs of city computed on the two newly formed edges and its expansion cost and second expansion cost computed in the earlier iteration. The condition “ all in not visited” ensures that cost of all uninserted cities is computed. The outer loop chooses the city for insertion that has largest expansion cost.

5.3. Design of Fast Max Difference Insertion Heuristic (FMDIH)

Faster design of MDIH is possible on a similar basis to that used in faster design of FLIH. A key difference between LIH and FLIH is that, in each iteration, LIH calculates only the expansion costs for the unvisited cities, while FLIH (approximately) computes both the expansion costs and the 2nd expansion costs. By going “one step further” in this sense, FLIH facilitates much faster update of this information, while sacrificing some exactness. To extend the same principle to fast MDIH, we will go “one step further” in the computations required for the city selection rule in MDIH. In MDIH, city selection requires both the expansion cost and the 2nd expansion cost for each city not in the subtour (7). Therefore, we can design a fast MDIH that additionally computes and records the 3rd expansion cost of each city not in the subtour. This means we need to record at most five edges per iteration: the two newly added, plus the expansion, 2nd expansion, and 3rd expansion cost edges from the previous iteration. As with FLIH, if one of the latter three was broken in the current iteration, we will simply rely on the others to provide approximations. A detailed procedural summary of the fast max difference insertion heuristic (FMDIH) is given in Procedure 3.

Procedure FMDIH starts
Form circular linked list of cities not in the subtour
Form empty Linked list
Setup initial subtour of three cities
Delete from the list
Insert in the empty list
Initialize expansion costs, 2nd expansion costs and 3rd expansion costs w.r.t. all in to very high value
while is not empty do begin
     do begin (Visit all in )
         Let
          = Cost of insertion of city on 1st newly formed edge
          = Cost of insertion of city on 2nd newly formed edge
          =
          =
          =
         If    expansion cost edge is broken in last iteration
         then   
         else if   2nd expansion cost edge is broken in last iteration
         then   
         else if   3rd expansion cost edge is broken in last iteration
         then   
         else     
         end if
     while all in not visited
     Choose city in such that cost difference
     Delete from the list
     Insert in on the edge connected to
end while
Procedure FMDIH ends

In “Procedure 3”, again, the initial subtour is computed by the first five lines, and the remainder expands it. The inner loop of the remainder is used to compute the expansion cost , the second expansion cost , and the third expansion cost for each city . The definitions of these costs are given in (5) and (6). The details of the inner loop achieve a shortcut in computing these costs, which involves four scenarios. In the first scenario, the edge related to the expansion cost of city in the previous iteration is broken. In the second scenario, the edge related to the second expansion cost of city is broken; in the third scenario, the edge related to the third expansion cost of city is broken, and in the fourth scenario none of the three expansion edges is broken. In the first scenario the expansion cost, second expansion cost, and third expansion cost are assumed to be the smallest three of four quantities: the two expansion costs of city computed on the two newly formed edges and its second and third expansion costs in the last iteration. In the second scenario they are taken as the smallest three of a slightly changed set of four quantities: the two expansion costs of city computed on the two newly formed edges and its expansion cost and third expansion cost computed in the earlier iteration. In the third scenario, the four quantities are now the two expansion costs of city computed on the two newly formed edges and its expansion cost and second expansion cost from the earlier iteration. In the fourth scenario, first, second, and third expansion costs are taken as minimum costs of five quantities, that is, two expansion costs on two newly formed edges and first, second, and third expansion costs of the previous iteration. The condition “ all in not visited” ensures that cost of all uninserted cities is computed. The outer loop chooses the city for insertion that has maximum cost difference (7).

6. Experiments and Results

This section consists of four subsections. In Section 6.1 we report experiments that evaluate each of the five types of initial subtour establishment rules described in Section 3 allowing us to “configure” and complete the new MDIH. Later in Section 6.2 performance of MDIH is compared with all baseline heuristics considered in this paper, that is, NIH, FIH, CIH, and LIH. In Section 6.3 the cost based insertion heuristics, that is, CIH, LIH, and MDIH, are compared with their faster versions FCIH, FLIH, and FMDIH. Finally in Section 6.4 the focus of our evaluation is to investigate the potential for FMDIH as a new heuristic for fast approximate solution of TSPs, by comparing it with the current best-regarded heuristic, that is, FIH.

Experiments are conducted on 109 datasets, involving from 14 to 15112 cities from a popular test bed [14]. For each heuristic and problem instance, 30 simulations are run independently, and we record the best, worst, and average solutions, the standard deviation in solution quality, and the average execution time. For the majority of experiments we provide summary statistics in this section (volume of results data precludes otherwise); however a full description of results is provided in an appendix for the FMDIH and FIH tests. The hardware used in the experiments was an Intel® Core i3-2348 M CPU @ 2.30 GHz, 8.0 GB Memory, with 64 bit operating system. The heuristics were coded in Microsoft Visual Studio C/C++.

6.1. Experiments to Finalize Initial Tour Establishment Step of Max Difference Insertion Heuristic (MDIH)

First of all, in Table 1 we can see a summary of the results for MDIH with the five different strategies for generating the initial subtour. The heuristics named MDIH-1 to MDIH-5 represent MDIH using methods (i) to (v) as described in Section 3. Each column in Table 1 refers to a certain statistical value in relation to deviation from the known optimal solution. The mathematical description of each of those values is given below, where to in (10)–(14), respectively, represent the quantities in columns 1–5 in Table 1:where,, means number of datasets which equals 109, means number of simulations which equals 30, means cost of solution of a simulation on a dataset , means cost of optimal solution of a dataset .

In Table 1, it can be seen that MDIH-5 has the best average solution quality. Its average quality for worst solution is also among the best, while its standard deviation is also the best of the five. Only on one criterion, average of best solution, its performance is worst among all. However, the difference between the best and worst values for this statistic is only 0.17%, and we propose that it can be ignored in favour of choosing method (v) as the appropriate initial subtour establishment rule for MDIH and FMDIH.

6.2. Experiments to Compare Performance of All Baseline Heuristics NIH, CIH, LIH, FIH, and MDIH

Table 2 summarises the results of the five tour construction heuristics NIH, CIH, LIH, FIH, and MDIH, with MDIH configured to use the fifth initial tour establishment rule. (The results for MDIH are already in Table 1 but are copied into Table 2 for convenient comparison.) From the results it can be seen that performance of MDIH is best among all heuristics on all five criteria. For example, the average deviation from the optimal solution for MDIH is 4.64%, and there is a relatively sharp decline until we come to the second best on this category, which is FIH at 7.27%. It seems that the performance of FIH is second best (to MDIH) in all criteria except standard deviation. Meanwhile, LIH seems to show the worst performance among those examined, on all criteria except the average deviation from the best solution. Closer inspection of the full results, however, shows that the relatively poor figures for LIH derive from a notable failure on only one problem instance, BRG180. On this instance its best solution is 260% (average + 22 standard deviations) away from the optimal, and its worst solution is 2032.82% (average + 59 standard deviations) away from the optimal. This dataset brings out the worst performance from each of the heuristics and is clearly a particularly unusual example with a structure that deviates from the vast majority of TSPs. In fact this problem arises from a participant-comparison metric in a cards-playing tournament, and performance of any heuristic on it is therefore not reflective of performance on cases where the matrix contents are more closely associated with physical distances. We therefore decided that omitting this dataset would lead to a more useful cross-comparison of the heuristics, and the outcome of that for the remaining 108 datasets is given in Table 3; MDIH is preserved as the best performer in all categories, while LIH is now third best.

In Figure 2 we attempt to visualise the performance of these five heuristics as a function of the number of cities, with number of cities on the horizontal axis, the vertical axis being percent difference of average solution of 30 simulations from the optimal solution. Since there is dense accumulation of datasets up to 2000 cities, the picture is not very clear up to this point. However after this point the pattern is quite clear. It can be seen that heuristics follow two different patterns of rise and falls of graph. One pattern is followed by MDIH, FIH, and LIH, while CIH and NIH follow another pattern. The MDIH-FIH-LIH pattern is dominated by MDIH, followed by FIH. While the CIH-NIH pattern is dominated by CIH. Notably, we can see that MDIH is not bettered by any of the other heuristics as we move beyond the 2000 city mark.

To make the picture clear for datasets up to the 2000 cities another graph in Figure 3 is presented. This graph removes the results of datasets more than 2000 cities to make picture clear for smaller problems. However even in this graph picture is not clear up to 200 cities as there is a dense accumulation of datasets up to this point. However picture beyond 200 mark is quiet clear. Again seemingly there are two patterns, one followed by CIH-NIH and another followed by MDIH-FIH-LIH. Former is dominated by CIH and latter is dominated by MDIH. In overall picture mostly MDIH graph dominates all the other graphs.

6.3. Experiments to Compare Performance of Cost Based Insertion Heuristics with Their Faster Counterparts

We turn now to the fast approximate variants of these heuristics. Table 4 shows comparative results for the standard and fast/approximate variants of CIH, LIH, and MDIH. From the table it can be seen that there are statistically only small differences between the standard and fast variants across all the four criteria. The largest difference between any standard and fast variant occurs in the case of CIH and is in favour of the faster method, with FCIH solution quality on average deviating 13.99% from optimal, compared with 14.26% for CIH. In all cases, the fast method naturally provides a significant advantage in execution time, with a reduction of 70–80% from the execution time of the standard variant.

We also performed statistical tests (Student’s -test) to investigate whether or not there was any significant difference in solution quality between the standard and faster versions. While comparing results between CIH and FCIH based on a confidence level of 95%, it was found that CIH performed better than FCIH in 17 out of 108 datasets, while FCIH performed better than CIH on 10 out of 108 datasets. Similarly, while comparing results between LIH and FLIH it was found that LIH performed better than FLIH on only 2 out of 108 datasets while FLIH performed better than LIH in 18 out of 108 datasets. While comparing results between MDIH and FMDIH it was found that MDIH performed better than FMDIH in only one out of 108 datasets while FMDIH performed better than MDIH in 3 out of 108 datasets. From these results it is clear that the new heuristics detailed in this paper are competitive in solution quality with the original slower versions; moreover, when there is a statistical difference in quality, this is more often than not (with the exception of CIH versus FCIH) in favour of the faster heuristic. We would conclude that the faster heuristics are generally worth applying to save computational time, with the risk to jeopardise solution quality being balanced by a similar chance of improved solution quality.

We note that a detailed performance comparison among the faster designs is not necessary, since the performance characteristics of the faster designs clearly echo those of the standard designs; therefore comparison among faster versions of heuristics reflects the same comparative analysis that was done among standard versions in Table 3.

6.4. Experiments to Compare Performance of FMDIH and FIH

Since MDIH has turned out to be best among all heuristics in the above experiments and its faster version FMDIH has provided same solution quality equal to MDIH with reduced time complexity equal to FIH, this shows a leap forward in the achievement of solution quality with time complexity of . To test whether or not this advantage is statistically significant we have given comparison of FMDIH with FIH in Appendix with full detail and commentary along with statistical tests. In terms of the columns in Tables 24, we simply note here that the FMDIH summary values are, in order of the columns, as follows: 4.30, 8.27, 6.02, 0.99, and 0.14. These are to be compared with the FIH summary values from Table 3, which are, respectively, 6.84, 12.89, 9.51, 1.52, and 0.64. Here however we visualise the relative performance of FMDIH and FIH (previously the best reported tour construction heuristic) with Figure 4, again using the same axes as in Figures 2-3. It can be seen in Figure 4 that FMDIH is consistently better than FIH throughout the dataset spectrum, with the advantage even apparent in the dense area of problem instances smaller than 2000 cities.

7. Conclusions and Future Work

In this paper a previously unpublished tour construction heuristic, namely, the largest insertion heuristic (LIH), is introduced. It is shown that LIH completes the set of well-known popular tour insertion heuristics (cheapest, nearest, and farthest) and seems to produce better solution quality than cheapest and nearest insertion heuristics on the great majority of datasets. Furthermore, this paper also introduces max difference insertion heuristic (MDIH). It is based on a max difference concept previously introduced in [11] as a way to engineer variants of other heuristics; however in this paper we have argued that, following its “completion” by defining an initial subtour establishment method, it merits “first-class” status as a heuristic in itself, alongside others of the same type. The resulting heuristic MDIH produced better solutions on average than all the other baseline heuristics considered in this paper. Furthermore, we have introduced complexity curtailing techniques that reduced worst case complexity of cost based insertion heuristics, that is, CIH, LIH, and MDIH, from to , resulting in the development of their faster versions FCIH, FLIH, and FMDIH. It is shown that the faster versions do not statistically produce inferior solutions but reduce computational cost up to 80%. The development of FMDIH proved a leap forward in production of quality solutions among community of complexity heuristics. This is because it has performed 3.5% on average better than FIH on a wide spectrum of popular datasets. We believe this is the first time that FIH has been demonstrated to be outperformed by an heuristic on a wide spectrum of popular test bed. Our immediate future work in this direction is inspired by complexity curtailing techniques. What if these techniques are introduced in other higher order heuristics to reduce their complexity. We believe more interesting results are yet to come.

Appendix

Comparison between FIH and FMDIH

This appendix presents detailed dataset-wise comparison between two tour construction heuristics, that is, farthest insertion heuristic (FIH) and fast max difference insertion heuristic (FMDIH). Both heuristics have complexity of the order . However FMDIH designed in this paper has shown consistently better performance in the wide spectrum of datasets in the popular test bed. In Table 5 column-2 shows the results of FIH and column-3 shows the results of FMDIH. Column (a) shows best value among 30 simulations and its percentage difference from the optimal value. Column (b) shows worst value among 30 simulations and its percentage difference from the optimal value. Column (c) shows average value of 30 simulations and its percentage difference from the optimal value. Column (d) shows standard deviation of 30 simulations and its value as a percentage of the optimal value. Column (e) shows average execution time of 30 simulations. The datasets are arranged according to size from smallest (14 cities) to largest (15112 cities). From the column (a) of the table it can be seen that FMDIH has obtained optimal solution in 9 datasets against 7 datasets by FIH. FMDIH has obtained better solution than FIH in 94 out of 108 datasets and in 6 datasets they have solution of same quality. This means that FMDIH has worse solution than FIH in only 8 datasets out of 108 datasets. From the column (b) of the table it can be seen that FMDIH has worse solutions of better quality than FIH in 105 out of 108 datasets. In two datasets they have worse solutions of equal quality. It means only in one dataset FIH has worse solution better than FMDIH. From column (c) of the table it can be seen that average solution of FMDIH is better in 105 out of 108 datasets. This means FMDIH is beaten only in 3 datasets in terms of average solution. From the column (d) it can be seen that FMDIH has better standard deviation than FIH in 96 out of 108 datasets. This means only on 12 datasets FMDIH is worse in standard deviation than FIH. The story does not stop here; there are more interesting comparisons. By comparing column 3C with column 2a it can be seen that FMDIH has obtained average solutions in 69 out of 108 datasets that are better than the best solutions of FIH. However FIH has not attained this feet on any of the datasets. By comparing column 3b with 2c it can be seen that worst solutions of FMDIH are better than the average solution of FIH in 79 out of 108 datasets, while in none of the datasets FIH has attained this feet. Now most interesting result can be seen by comparing column 3b with column 2a. The worst solutions obtained by FMDIH in 30 simulations are better than the best solutions obtained by FIH in 30 simulations in 28 out of 108 datasets (>25%). If results are looked at more carefully then it can be seen that these 28 datasets start from RD400 dataset. There are only 49 datasets that are of the size 400 cities and above. In 28 out of these 49 datasets (>50%) worst solutions of FMDIH are better than the best solutions of FIH. If dataset spectrum is squeezed to larger band of 1000 cities and above then it is very surprising to see FMDIH has attained worse solutions better than the best solutions of FIH in 25 out of 31 datasets (>80%). Finally in all the 8 datasets of greater than 4000 cities worst solutions of FMDIH are better than the best solutions of FIH in 30 simulations. These results show that advantage to FMDIH improves with increase in size of the dataset and this clearly put performance of FMDIH much ahead of FIH. Student’s -test was conducted which confirmed that the output of FMDIH is better than FIH up to the 99.5% of confidence level on the 100 datasets. On another dataset FMDIH had better results up to 95% of confidence level, where FIH did not prove any statistical supremacy on any of the 108 datasets.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

The authors are grateful for financial support from Innovate UK and Route Monkey Ltd. via KTP partnership no. 9839.