Abstract
Hub location problems have been studied by many researchers for almost 30 years, and, accordingly, various solution methods have been proposed. In this paper, we implement and evaluate several widely used methods for solving five standard hub location problems. To assess the scalability and solution qualities of these methods, three wellknown datasets are used as case studies: Turkish Postal System, Australia Post, and Civil Aeronautics Board. Classical problems in small networks can be solved efficiently using CPLEX because of their low complexity. Genetic algorithms perform well for solving three types of single allocation problems, since the problem formulations can be neatly encoded with chromosomes of reasonable size. Lagrangian relaxation is the only technique that solves reliable multiple allocation problems in large networks. We believe that our work helps other researchers to get an overview on the best solution techniques for the problems investigated in our study and also stipulates further interest on crosscomparing solution techniques for more expressive problem formulations.
1. Introduction
Hub location problems deal with the location of hub facilities in a network [1]. These problems are highly relevant in several fields, such as transportation [2–6] and telecommunication [7, 8]; see [9–11] for some excellent surveys. In general, flows are routed from origin nodes to destination nodes through a “spokehubhubspoke” structure. First, flows from each origin node are collected by their hubs; then, the flows are transferred through at most another hub; and finally, flows are distributed to their destinations. Because of the economies of scale, there are cost discounts while transporting flows between hubs; the total cost with the “spokehubhubspoke” structure may be less than that for transporting flows from origins to destinations directly [10].
If the number of hubs is fixed at design time, the hub location problem is called hub median problem (pHMP) [12]. In this paper, we consider hub location problems where the number of hubs is fixed. Based on the constraints for node assignment, hub location problems have two basic structures: single allocation (SA) and multiple allocation (MA) [13]. In SA problems, each node is allocated to a specific hub, and all inbound or outbound flows of each node must travel through its hub. In MA problems, on the other hand, the flows for different origindestination (OD) pairs can be routed through different hubs. An example for the SA and MA problem is shown in Figure 1. Over time, many variations of the core problem, including uncertainty and reliability, have been considered by researchers. In early research, it is assumed that all nodes can always function properly. However, in reality, each hub can be disrupted because of possible accidents. For instance, in air transportation networks, nodes (i.e., airports) can be disrupted because of bad weather or equipment outage [14, 15]. Therefore, the reliability of nodes should be considered for more realistic modeling [11, 16], taking into account the actual topology [17]. In early research, the hub networks are assumed to be complete; that is, hubs are fully connected with each other. In recent years, incomplete hub networks have also been studied [18]. In this case, some hubs are not directly connected.
(a) Single allocation problem
(b) Multiple allocation problem
Apart from a few exceptions, hub location problems are NPhard [19, 20]. Therefore, many researchers have focused on proposing reasonable models and efficient algorithms for solving hub location problems. Reference [1] published the first paper about hub location problems, and the first mathematical formulation for pHMP in the following year was proposed [12]. After that, the pHMP has been studied by many researchers. Various types of efficient models and algorithms have been proposed. In early studies for hub location problems, researchers applied different solution methods for solving them. For instance, [21] proposed a heuristic method based on tabu search (TS) for solving single allocation problems. Reference [22] proposed a new linear formulation for single allocation model with fewer variables and constraints than the classical formulations. A heuristic based on simulated annealing algorithm (SAA) and a linear program based on branchandbound (BB) method are presented for solving the problem as well. After that, branchandbound methods and Lagrangian relaxation (LR) algorithm started to be applied to hub location problems. Reference [23] proposed a sophisticated approach based on Lagrangian relaxation for solving single allocation hub problems. Compared with simulated annealing and branchandbound method, the problem with 25 nodes can be solved within a reasonable time and solutions with high quality can be obtained. Reference [24] presented a branchandprice algorithm (BP) for solving a capacitated single allocation problem. To obtain a tight lower bound, Lagrangian relaxation was applied to the problem. Instances with up to 200 nodes were solved optimally with their algorithm. Reference [25] studied the combination of congestion and capacity decisions in hubandspoke network design. They decomposed the problem into subproblems with Lagrangian relaxation. Reference [26] proposed new models with the consideration of uncertainties (hub failures and backup hubs) for single and multiple allocation problems. Based on the heuristics proposed by [23], they used Lagrangian relaxation and branchandbound method to solve reliable models for these two allocation problems.
In addition to Lagrangian relaxation and branchandbound method, genetic algorithms (GA) are also commonly used for solving hub location problems. Authors of [27] proposed an effective method based on a genetic search algorithm. They showed that the GAbased method outperforms other approaches in their study on both computation time and solution quality. The method has a potential for solving large problem instances. Based on their research, [28] proposed two genetic algorithms for solving an uncapacitated single allocation hub median problem. They compared their methods with a tabu search, simulated annealing, and path relinking method (PRM) [35]. Reference [30] proposed a genetic algorithm approach for solving a capacitated single allocation hub median problem, which can guarantee the optimality for networks with up to 50 nodes. The fuzzy capacitated hub center problem was considered by [36]. They provided a genetic algorithm method to solve it. Reference [37] proposed an efficient genetic algorithm for solving the liner hubandspoke shipping network design problem. The authors of [31] presented the formulation and a GAbased algorithm for solving a planar hub location problem in reasonable time. The Weiszfeld algorithm (WA) [38] and the locationallocation algorithm (LAA) [39] were also compared with their heuristic on different datasets. In addition, genetic algorithms were also used to solve the reliable hub location problem by [33]. With the consideration of hub disruptions, they proposed a mathematical model for single allocation cases. The proposed method performs better than CPLEX.
Based on traditional models, several variants of the problems with new constraints were also studied, such as reliability. Reference [8] presented new reliable models for maximizing expected flow between city nodes. Reference [40] proposed a compact mixed integer program model and a continuum approximation model for the reliable uncapacitated hub location problem. With a customdesign Lagrangian relaxation method, nearoptimal solutions can be obtained. In addition to reliability, the incomplete hub networks have also been studied. Reference [18] proposed models for several single allocation problems under incomplete hub network design and the mathematical formulations with variables were presented. Reference [34] also modeled the incomplete hub location problem with and without hopconstraints. Benders decomposition technique was used to solve the problem in large scales.
In recent years, new algorithms and applications on hub location problems have been proposed. Reference [41] proposed approximation algorithms for solving the single allocation problem. Reference [42] studied single allocation problems with multiple capacity levels, where the decision making process includes the size of hubs. An enhanced Benders decomposition (EBD) algorithm was used to solve multiple allocation uncapacitated hub location problem for largescale networks by [29]. The algorithm was evaluated in computational experiments on different instances with up to 500 nodes, compared with Benders decomposition [43], adjustment procedure [44], relaxandcut algorithm (RCA) [45], and CPLEX. Reference [46] proposed two formulations for capacitated ordered median hub location problems. Reference [32] proposed a new method for single allocation hub location analysis. Based on the spatial and flow properties of nodes, a clusteringbased potential hub set (CBS) can be generated, which can help narrow the solution sets and reduce the computational complexity of the problems. In addition to the standard hub location problems above, several variants of hub location problems have also been studied. Reference [47] developed a model with equilibrium constraints for the intermodal hubandspoke problem and proposed a hybrid genetic algorithm for solving it. Reference [48] studied hub location problems under uncertainty. Generic models were proposed for single and multiple allocation problems. Reference [49] also presented a programming formulation for hierarchical multimodal hub location problem with timedefinite deliveries and analyzed the sensitivity on Turkish network. Reference [50] studied the hub location problem under competition with Civil Aeronautics Board (CAB) and Turkish Postal (TR) datasets as case studies.
In Tables 1 and 2, a summary of allocation problems and representative solution techniques in the literature is shown. Most of other papers in the references that are covered by them are not shown here. Various methods have been proposed for solving hub location problems, such as LR, GA, and CBS. Methods are often tuned for specific datasets, specific hardware properties, or specific parameters. Although newly published methods compare with few prior works, the selection of these works is often suboptimal, and comparisons are carried out on different datasets with different parameters; datasets are not often made publicly available, which means that results are not reproducible. Moreover, getting an overview over the state of the art in this field is difficult, as results are published in various domains without much crosstalk and often study variations of the core problems. It is difficult for new researchers to choose the best solution algorithm for a given problem instance, because of the lack of common benchmarks and the dispersal of research in different communities caused by the heterogeneity of problems and approaches.
In this paper, the formulations of five types of phub median problem variations (classical single, classical multiple, reliable single, reliable multiple, and incomplete single allocation problems) are surveyed first. Then, three different solution techniques from the state of the art, GA, LR, and restricted CBS (RCBS) method, for solving hub median problems are provided. In addition, we use CPLEX as a benchmark reference. Finally, to compare the performance of these methods, the commonly used TR dataset [51], CAB dataset [12], and Australia Post (AP) dataset [52] are selected as case studies. In our experiment, the solution qualities, computation time, and main memory usage are evaluated.
We would like to emphasize that this paper is not intended as a theoryfocused survey of solution techniques. There exist plenty of excellent works in this area [9–11]. The major contributions of our study are as follows:(1)We perform an exhaustive evaluation of five types of hub location problems and four solution techniques. We synthesize the essence of solution algorithms and adapt them for different problem instances, to obtain five competitors for each method.(2)All competitors are evaluated against three standard datasets from the literature: TR, CAB, and AP. Running all experiments with the same setup (hardware/software) parameters allows us to provide a comprehensive evaluation of all techniques. Moreover, we provide an unbiased view on the state of the art, as it is perceived by the papers and their methodological descriptions.(3)The wealth of experiments we performed and the number of techniques we compared allow us to draw several insights into scalability, solution quality, and the possibility of exploiting parallelism.
We would like to point out that a comparison of different methodologies has to be taken with caution. Some methodologies were proposed to find provably optimal solutions, while others are pure heuristics without any guarantees on optimality. Comparing their solution quality and solution time allows us to estimate the tradeoff between running time and solution quality by each method for different datasets. This view is novel to the literature, where, usually, methodologies are not crosscompared. In addition, we have implemented several standard solution techniques in the literature, a considerable amount of work, given that the source code is not published alongside of papers. Clearly, we implemented only a subset of existing methods. Finally, the hub location models chosen for our study make simplified assumptions about the network and the modeling level of detail, for instance, ignoring delay and propagation of delay through a network [53]. Therefore, our study should be understood as a first step into the comparative evaluation of hub location problems.
The remainder of this paper is organized as follows. The formulations of five types of hub location problem models are provided in Section 2. Selected solution techniques, that is, genetic algorithm (GA), Lagrangian relaxation (LR), and clusteringbased (CBS) method, are abstracted and synthesized in Section 3. To compare these methods, the evaluation of TR dataset, AP dataset, and CAB dataset as case studies is presented in Section 4. The paper concludes with Section 5.
2. Hub Median Problem
The hub median problems discussed in this study have two distinct structures: single allocation (SA) and multiple allocation (MA). In addition, each problem can also be considered in the classical models and as a reliable model, depending on whether the reliability of hubs is taken into account. In classical models, all hubs are assumed to function properly, while each hub can be disrupted in the reliable models [54]. An example is shown in Figure 2: There is a regular route from Pittsburgh to San Francisco with two hubs, New York City and Los Angeles. If New York City hub is disrupted, an alternative route “Pittsburgh Chicago Los Angeles San Francisco” can be used, and node Chicago is the backup hub of New York City. At last, the single allocation problem with incomplete hub networks is also considered. All five formulations of hub median problems are introduced and formalized in this section: classical single allocation (CSA), classical multiple allocation (CMA), reliable single allocation (RSA), reliable multiple allocation (RMA), and incomplete single allocation (ISA). The major reason for revisiting the mathematical formulation is that many slightly different problem formulations have been used across existing studies, where modifications often have a significant impact on expressivity and time complexity.
2.1. Classical Single Allocation Problem
In classical single allocation problem, each node is allocated to a single hub and all the inbound or outbound flows of each node must go through its hub. Let be a network. Here and are the set of nodes and links, respectively. The numbers of nodes and hubs are and . Let and be the cost and the flow between nodes and for each pair of . The formulation for CSA is shown as follows [55].
CSAP
Here, (1) is the objective function of total costs. Let be the cost of route [56]. Parameter is the cost coefficient (economies of scale discount factor, see, e.g., [57]) for transporting the flows between two hubs. Variables and are the route variable and assignment variable, respectively. They are defined as follows [58]:
Equations (2)(3) ensure that if node is assigned to hub (or assigned to ), the flow from (or to ) must go through hub (or ). Equation (4) ensures that each node can be associated with one hub only. Equation (5) indicates the number of hubs is . Equation (6) indicates that each node can only be assigned to a hub. Based on the fundamental formulation for CSA presented by [59], [22] also formulated CSA with a new model, in which they reduced the number of variables for linear program from to . Finally, an assumption has been used in many studies: , . Therefore, in this paper, we consider only the case of .
2.2. Classical Multiple Allocation Problem
In classical multiple allocation problem, the flows from one origin to different destinations can be routed through different hubs. Thus, we do not need to assign each node to a specific hub. Let be the variable that is used to indicate whether node is a hub, and the route variable is still used. The formulation for the CMA problem is shown as follows [45, 60].
CMAP
Here, (11)(12) ensure that the route for each OD pair must go through hubs. Equation (13) shows that each OD pair can have only one route.
2.3. Reliable Single Allocation Problem
The problems above are both without the consideration of hub reliability. In realworld cases, however, each hub can be disrupted for many reasons. For instance, in air transportation networks, nodes (i.e., airports) can be disrupted due to bad weather or equipment outage. In this case, backup hubs need to be considered. Let be the disruption probability of node . Let be if and 0 otherwise. For the backup assignment, we use the strategy proposed by [26]. They made two major assumptions: first, only single disruptions are considered; second, for the routes with two hubs, the unaffected hub must still be inside the alternative route. Assume and are the backup hub variables that are defined as follows:
Therefore, the RSA problem is formulated as follows [26].
RSAP
Here, the first two terms of the objective function are the cost for regular routes and the left term is the cost for alternative routes. Equations (19)–(22) are the same as (2)–(5). Equations (23)(24) ensure that nodes can be assigned only to hub nodes, and backup nodes must be different from the corresponding hubs. Equations (25)(26) indicate that one hub needs an alternative hub only if it is different from the OD pair and it cannot be disrupted otherwise. It shows that RSAP is a quadratic integer program. It can be linearized with the linearization method proposed by [61].
2.4. Reliable Multiple Allocation Problem
Similar to Section 2.2, variable is used to denote whether node is a hub. Therefore, the formulation of the RMA problem is shown as follows [26].
RMAP
Here, (28)(29) ensure that each hub node can only be assigned to itself. Equations (32)–(35) are similar to (23)–(26).
2.5. Incomplete Single Allocation Problem
For the incomplete single allocation problem, in addition to hub location and assignment, a fixed number hub links needs to be determined. The formulation for this problem is quite different from the formulations introduced above. Let be the total flow from source node , be the total flow to destination node , and be the flow originating from node which is routed on hub link . The decision variable for hub links is defined as follows:
Therefore, the ISA problem is formulated as follows [18].
ISAP
In (37), the first two terms are the cost of flows between hubs and nonhub nodes, and the third term is the cost of flows in hub networks. Equations (38)(39) are the constraints of single allocation and number of hubs. Equation (40) is the constraint for the number of hub links; that is, links are established between hubs. Equations (41)–(43) indicate that nodes can be assigned to a hub only and hub links can be established between hubs only. Equations (44)(45) are the flow equilibrium constraint and flow capacity constraint, respectively.
3. Solution Techniques
In Section 2, we revisited the formulations of five different hub median problems. To solve these problems, various solution techniques have been proposed, such as GA and LR; see the introduction and relevant surveys for details. In most literature, however, researchers usually compared their approaches with very few/outdated methods instead of new algorithms from the state of the art. Furthermore, the properties of some new methods were compared only with certain optimization solvers (e.g., CPLEX). Several approaches from the state of the art are summarized and revised accordingly in this section, such as GA (Section 3.2), LR (Section 3.3), and CBS (Section 3.4), all tailored towards the five standard problem definitions.
3.1. Overview of Solution Techniques
In this section, the idea of each method for solving five hub median problems is revisited. Let us start with GA. A large number of nearoptimal solutions can be obtained with GA, which has been used to solve many complicated problems and its feasibility for solving hub locations problems has been shown in the literature. When solving hub location problems with GA, chromosomes are used to represent solutions [62]. In the CSA problem, each node is allocated to one hub. Thus, node assignments can be modeled as chromosomes directly. The crossover and mutation operators are used to change the node assignments and generate new solutions. For the CMA problem, the chromosomes are represented by the routes between OD pairs. In addition, in the reliable cases, we need to add an operator for selecting the backup hubs, which is used to find an optimal alternative route for each given regular route. For incomplete problem, the connectivity of hub networks is considered first while establishing hub links. In LR, the formulations of five allocation problems can be transformed into two independent subproblems, by adding several Lagrangian multipliers. Then, the lower bounds and upper bounds of the original problem can be obtained by solving these subproblems. The Lagrangian multipliers can be updated iteratively, until reaching the optimal solution. The CBS method is a very novel approach for analyzing hub location problem. This method was proposed for solving single allocation problems. However, it can also be applied to multiple cases. In this method, based on spatial properties and travel demands of nodes, a subset of potential hubs is obtained. Then, several constraints based on these subsets are added to the problem formulations, and the computational complexity is reduced significantly.
Note most solution methods that have been proposed by researchers. We use the original procedures to describe them. However, if a solution method has not been published for solving a given problem, we do some modification based on previous versions of the method. For instance, GA has not yet been published for solving incomplete problems and the procedure of this method is modified for incomplete problems in our study.
3.2. Genetic Algorithms
To solve hub location problems, several researchers presented GAbased methods in recent years. Because most methods are proposed for solving single allocation problems, the standard strategy for CSA using GA is introduced first. Then, based on this simple strategy, we develop an extension for RSA by adding a method for the backup hub selection. The GA strategy developed in this section is based mostly on the approaches proposed by [27, 33].
Initial Population. In the single allocation problem, each node is assigned to a particular hub. Thus, each individual has two arrays: hub array (HS) and assignment array (AY). The hub array HS has a length of and represents the hub set, and the assignment array AY has a length of and represents the assignment of nodes to hubs. If node is assigned to hub , then AY. Here we assume that if node is a hub, then it must be assigned to itself; that is, AY. Based on the solution representation above, the initial population can be generated. However, before that, the hub set must be determined. To reduce the total cost, large flows should travel through the routes with less cost. Let and be the total outbound flow and inbound flow of node , respectively. Thus, it should be more likely to locate hubs at those nodes with large flows to travel (i.e., with large ). Then, in each chromosome, we assign each node to its closest hub.
Crossover. With initial population set, new offspring chromosomes can be generated by the crossover operator. To choose the parent chromosomes, we use roulette wheel sampling [27] and the fitness value of each individual is represented by the reciprocal of the total cost of the solution. The crossover operator is introduced with a small example. As shown in Figure 3, two parent chromosomes are selected randomly and the number of nodes and hubs is and , respectively. First, we choose a swapping number from 1 to and a crossover point from 1 to , randomly. Then, hubs are swapped between the two hub arrays. Each assignment array can be divided into two parts by the point , and each part is combined with the opposite part of the other assignment array. For instance, in this example, we assume that and . One hub is swapped between and . If they are 4 and 6, then new hub arrays are and . Next, we combine the left part of with the right part of , and new assignment array is generated. With the same operator, can also be obtained. Now, there may exist an infeasible case where some nodes are assigned to a nonhub node. For instance, in , node 4 is assigned to itself which is not a hub in the new individual. Therefore, to avoid the infeasibility, we need to do a reallocation after each crossover operator. Each node assigned to a nonhub node needs to be reallocated to its closest hub in the new hub array. In addition, each hub must be reallocated to itself, obviously.
(a) Crossover
(b) Mutation
Mutation. To avoid the early convergence near a local optimal solution, the mutation needs to be operated on the chromosomes with a probability. In this paper, two mutation operators are applied to the assignment arrays only: (A) Select two nonhub nodes randomly and swap their assignment. (B) Select one nonhub node randomly and reallocate it to another hub.
Because each hub node must be assigned to itself, the mutation operator is applied only to nonhub nodes (i.e., spokes). For each chosen individual, these two mutation operators are both used, and the result with more total cost is discarded. Using a crossover operator and a mutation operator, two offspring individuals can be generated. Then, elitism is applied in the population. The individuals with worse fitness values are replaced by those with better fitness values.
Backup Hub Selection. With the GA strategy introduced above, the CSA problem can be solved. However, while solving the RSA problem, the backup hubs need to be considered after determining the hub set and assignment. Because the alternative route of each OD pair is independent, we can find the best backup hubs for each individual by considering the route cost. For each OD pair , the regular route is determined by GA. Then, the best backup hubs for can be found by computing the costs of all possible backup hubs which is different from and selecting the minimum one. A similar strategy is used for selecting the best backup hub for .
GA Strategy for Multiple Allocation Problems. Each node is assigned to one hub in single allocation problems. However, in multiple allocation problems, each node can be assigned to different hubs for different OD pairs. Thus, the chromosome size is equal to the number of OD pairs here. Because we only consider the case , the size is equal to . While generating the initial population, we use the same method as single allocation problem to construct the hub array. Then for each , where , and , we choose a pair of hubs as its regular hubs from the hub array randomly and the backup hubs can also be selected with the method above similarly.
In the crossover step, we sort OD pairs in the order of and OD pair has an order number . Then, we choose two parent chromosomes with roulette wheel sampling and the hub arrays are swapped with the crossover method introduced above. After selecting a crossover point randomly, all the OD pairs with an order number less than are swapped between two parent chromosomes. Then, some OD pairs may use nonhub nodes as their hubs. Therefore, these OD pairs need to be reallocated to their best hubs with the least cost. After the reassignment, the last offspring individuals are obtained. In the mutation operator, a nonhub node and a hub are chosen randomly. Then all first hubs (or second hubs) of OD pairs (or ) are set to hub .
GA Strategy for ISA Problem. GA has not yet been published for solving ISA problems and we propose a new strategy here. In addition to the hub array and assignment array, the hub links should also be determined in the ISA problem. Thus, a set of hub links is added to each individual. We use the same method as single allocation problem to construct the hub array and assignment array while generating the initial population. In order to guarantee the connectivity of the hub network, a randomized version of the Prim algorithm [63] is used to generate hub links (See Algorithm 1).

In crossover step, after obtaining offspring hub sets with the same strategy as Figure 3, the link sets of offspring can be generated as follows: Let and be hub sets of two offspring. Let and be the link sets of two parent chromosomes. Then, , , and links in complete graph in . For the first offspring, we remove several links from with the guarantee of connectivity of network. The remaining links construct the link set of the first offspring individual. If is not connective or the number of its links is less than , we can add several links to it randomly first. The same strategy is applied to the second offspring. In the mutation step, in addition to the operation on assignment array, a link is selected from the link set randomly and replaced by another random link on the basis of connectivity of hub network.
3.3. Lagrangian Relaxation
As a widely used algorithm, Lagrangian relaxation has also been applied to hub location problems [23, 26]. In this section, we introduce the procedure of LR for solving hub location problems mostly based on the formulation CSAP (see (1)–(8)) presented in Section 2.1. The methods for solving the other three problems are similar.
Lower Bound. Let , , , , , be the Lagrangian multipliers of (2)–(4). Then CSAP can be transformed to a Lagrangian problem as follows: where
Note that two groups of variables and are independent now. Thus, the Lagrangian problem can be decomposed into two subproblems as follows.
LSub1
LSub2
Here (54) is used to reduce the size of the solution set in the model, and the result qualities can be improved significantly. The lower bound of the CSAP can be obtained by solving the two subproblems with the methods proposed by [23]. To obtain the optimal solution efficiently, we need a method for finding the upper bound.
Upper Bound. The upper bound can be obtained based on the solution of LSub1. After solving this subproblem, a set of hubs and some assignments are determined. However, in these results, some nodes may be assigned to more than one hub, and some nodes may be unassigned. Therefore, Algorithm 2 is used to find an upper bound with a feasible solution.

With the methods above, the lower bound and upper bound can be obtained for a group of particular Lagrangian multipliers. In this section, the complete procedure of LR with multiplier updating is shown as follows.
(a) Initialization includes the following: iteration number , max number of iterations , max gap , max iteration number for no improvement of lower bound , step size multiplier , and initial Lagrangian multipliers , , .
(b) Finding the lower bound LB is by solving LSub1 and LSub2.
Finding the upper bound UB is with Algorithm 2.
(c) Updating Lagrangian multipliers is as follows [64]:where
(d) If the lower bound LB is not improved in continuous iterations, then , and , , also need to be set to the current best values.
(e) If iteration number or , stop; else, go to (b).
The CSA problem can be solved with the procedure above. CMA, RSA, and RMA are solved with similar strategies proposed by [23, 26]. LRbased technique has not yet been published for solving ISA problems and, in fact, it is nontrivial to develop such a method. We need to define Lagrangian multipliers for several constraints and construct the Lagrangian problems. The constraint like (54) that can increase the convergence speed is necessary to be added to the Lagrangian problems but difficult to construct. Therefore, LR for the ISA problem is not evaluated in our study.
3.4. ClusteringBased Method
Reference [32] proposed a novel method for analyzing hub location problems by considering clusteringbased potential hub sets. They define the travel demand of node as . Then, the following observations are obtained: Nodes with larger travel demands are more likely to be chosen as hubs, especially those nodes far away from other nodes. However, some of the nodes are rarely chosen as hubs in the optimal solutions. In a cluster of nodes that are close to each other, at most one of them can be selected as a hub usually, although the travel demand of this hub is not the largest. The hubs prefer to be spread across the whole network. The optimal hub locations are almost insensitive to the value of .
Thus, the solutions of hub location problems strongly depend on both the spatial properties and the travel demands of nodes. Based on these properties, a heuristic is developed for generating a subset of potential hubs. The hubs for the optimal solutions can be selected from this subset and the computational complexity of the problem can be reduced. To generate the potential hub sets, all nodes need to be sorted by their importance, since the node importance depends on spatial properties and travel demands. The node importance can be measured with different formulations, and one of the best measures identified by [32] is as follows: where . Here, those nodes with large travel demands and far away from other nodes tend to be chosen as hubs. Then, we list all the nodes in descending order with the value of importance and the potential hub set is generated with most important nodes. The hubs are more likely to be selected from (note that if , then ).
Let be the average shortest distances between nodes in , and is the th most important node in . Therefore, the clusteringbased potential hub set algorithm (CBS) can be shown as Algorithm 3.

Here, each potential hub has a “hub circle” with a radius of . Node will be assigned to set , if there are no other nodes from in its hub circle; node will be assigned to set , if there are two or more nodes from that are less important than itself; node will be assigned to set , if is within the hub circle of node and node is more important than . Therefore, some subsets of potential hub nodes are generated based on the clustering properties, and several constraints for hub location can be added to the problem formulations as follows.
CBS
There must be at least one hub in each set , , and at least one hub in set . There is a restricted variant of CBS, called RCBS. In addition to (58), it adds another constraint where all the hubs must be selected from the potential hub sets, and the formulation is shown as follows:
Note that CBS adds constraints independently of the problems at hand. Therefore, it is a prefiltering method for hubs and can be used for all problems without modification. In addition, there are several other variants of CBS, such as augmented CBS and improved RCBS. The details can be found in [32].
4. Evaluation
In this section we report the results of our experimental evaluation of all implementations. Section 4.1 introduces the three datasets used in our study and describes the experimental setup. The results and analysis of the CSA, CMA, RSA, RMA, and ISA problems in the TR dataset are provided from Sections 4.2 to 4.6. In Section 4.7, we present the results for the AP dataset and CAB dataset that corroborate our findings for the TR dataset. The hub assignments for selected problem instances and datasets are visualized in Section 4.8. The analysis of parallel computation is shown in Section 4.9.
4.1. Introduction of Datasets and Experiments
To compare the performance of the methods introduced in Section 3, three commonly used datasets are utilized in this study. The TR (Turkish Postal) dataset includes 81 cities with distance and flows between them in Turkish Postal System [51]. The AP (Australia Post) dataset provides 200 nodes representing postcode districts with their coordinates and the flows between them in Australia [22]. The CAB (Civil Aeronautics Board) dataset is based the airline passenger interactions between 25 cities in USA in 1970 [12]. The numbers of nodes for the TR, AP, and CAB dataset are set to , , and . We set the cost discount coefficient for the TR dataset and CAB dataset, as well as for the AP dataset. For complete (CSA, CMA, RSA, and RMA) problems, we set the numbers of hubs for three datasets . For ISA problems, we set , , , [18]. The visualizations of all three datasets are shown in Figure 4. For the reliable models, the disruption probability of each node is set to a number randomly from . The rerouting coefficient is set to 2.
(a) TR dataset with
(b) AP dataset with
(c) CAB dataset with
We use the settings for GA and LR here as recommended in the literature. In GA, we set the population size for single allocation problem and for multiple allocation problem. The maximum number of generations is 200 [27]. If the best fitness value cannot be improved within 10 consecutive generations, the algorithm will stop. In LR, the step size multiplier is set to 6, and the maximum iteration number is 3000 [23]. If the lower bound cannot be improved within 50 consecutive iterations, then and the Lagrangian multipliers will be set to the current best values. If the maximum iteration number is reached or the gap between the lower bound and upper bound is less than a certain value (≤0.1%) or all Lagrangian multipliers are equal to zero, then LR can be terminated. Note that the results for both GA and LR could be further improved by tuning the parameters. Since CBS provides a refinement of potential hubs, we use it with CPLEX together as a preselection method. In this experiment, its restricted variant RCBS is tested because of its short computation time compared with CBS and other variants. In addition, the results obtained by CPLEX are used as a benchmark.
GA and LR are implemented with Python, while the RCBS and LP are solved with CPLEX. Because CPLEX uses multiple threads as default, we use a single thread in all experiments to obtain comparable results. Finally, the maximum computation time is set to 3,600 seconds; that is, an experiment will be terminated once the running time exceeds 3,600 seconds. The results for the TR dataset, AP dataset, and CAB dataset are presented in Figures 5–8. The solution gaps, computation time, and main memory usage are used to measure the performances of four methods [65]. Because the observations on three datasets are generally similar, we only report the results for TR in detail here and summarize the differences to other datasets subsequently. All experiments were executed on a server with 32 cores and 386 GB RAM, running Fedora 24 (Linux 4.8.4200.fc24.x86_64).
(a) Convergence for the TR dataset
(b) Convergence for the AP dataset
(c) Convergence for the CAB dataset
4.2. Classical Single Allocation Problems
The solution gaps, computation time, and main memory usage of four methods for CSA problems in the TR dataset with different numbers of nodes are shown in the first column (a, f, k) of Figure 5. Note that axis of all three figures is in log scale. In Figure 5(f), it can be seen that these methods all have similar computation time (around 1 second) while solving the problems with small size . With an increasing number of nodes, the computation time for CLPEXLP, LR, and RCBS increases significantly. In particular, when the number of nodes is up to 50, both CPLEXLP and LR reach the maximum computation time of 3,600 seconds and they cannot provide good solutions in this case. However, GA performs well here. It provides very good solutions with small gaps within a reasonable computation time for different numbers of nodes. Note that RCBS can also terminate within 3,600 seconds for . However, it sometimes provides a poor solution with a large gap even for small number of nodes because its additional constraints limit the search space too much. In Figure 5(k), the main memory usage of CPLEXLP and RCBS increases significantly with an increasing number of nodes. When , the memory usage of these two CPLEXbased methods both is over 10 GB while GA and LR use much less memory than them. Therefore, GA performs the best for solving CSA problems in the TR dataset. In addition, it needs more computation time to obtain good solutions for larger numbers of hubs with CPLEXLP, GA, and RCBS.
4.3. Classical Multiple Allocation Problems
The computational complexity of multiple allocation problems is much lower than for single allocation problems. Therefore, CMA is solved in polynomial time if the hub set is determined [26], because the problem can be solved by finding the best route for each OD pair independently in this case. The results for CMA problems in the TR dataset are presented in the second column (b, g, l) of Figure 5. The results show that CPLEXLP and RCBS take much shorter time for solving CMA than CSA. Thus, CPLEXLP can be used to solve CMA problems within a reasonable time for small numbers of nodes . However, just like the performance in CSA problems, RCBS sometimes provides relatively poor solutions with large gaps. Note that GA becomes the worst method with largegap solutions and long computation time because in the case of multiple allocation, the size of chromosomes is up to the number of OD pairs . It is more difficult to reach the nearoptimal solutions with random exploration in this case. We have increased the population size to 1000 in additional experiments with GA; however, results did not improve significantly. LR still needs a long computation time for large number of nodes. In Figure 5(l), similar to Figure 5(i), CPLEXLP and RCBS need much more memory with an increasing number of nodes, while the memory usage of LR is always much less than them. Because of the larger size of chromosomes, GA uses more memory for CMA than CSA problems. Finally, similar to Section 4.2, it is more difficult to obtain good solutions for larger with CPLEXLP, GA, and RCBS.
4.4. Reliable Single Allocation Problems
The reliable problems have more computation complexity than the classical problems. Thus, the maximum number of nodes for the evaluation of two reliable problems is set to . The performance of four approaches for solving RSA problems is shown in the third column (c, h, m) of Figure 5, indicating that when the number of nodes is , we cannot obtain any optimal solution with CPLEXLP directly. The results provided by RCBS are slightly better, but it cannot still guarantee the optimality of the solutions even for networks with small sizes. LR can provide acceptable solutions in networks of small size . However, with an increasing number of nodes, the solution gaps become large because of the slow computation for each iteration. The observation in Figure 5(m) is still similar to Figures 5(k) and 5(l): the main memory usage of CPLEXLP and RCBS increases fast with increasing numbers of nodes, while GA and LR use much less memory than them. Finally, similar to the case of CSA problems, GA presents solutions with high quality within a reasonable time.
4.5. Reliable Multiple Allocation Problems
The results for the RMA problems are presented in the fourth column (d, i, n) of Figure 5. Similar to the condition in RSA problems, the optimal solutions cannot be obtained with CPLEXLP within 3,600 seconds for . However, the gaps are smaller than in Figure 5(c), because of the lower complexity of multiple allocation problems. GA still performs the worst and RCBS cannot still guarantee the optimality always. These observations are similar to Section 4.3. The influence of the number of hubs is still similar to the sections above.
In general, RMA problems with can be solved by CPLEXLP or LR with small gaps (<0.1%) within 3,600 seconds. However, for the cases of , the problem cannot be solved within 3,600 seconds by any methods proposed in this paper. Therefore, an additional evaluation for the convergence is performed below.
The convergence progress of CPLEXLP and LR for RMA problems with within 3,600 seconds is shown in Figure 6 which indicates that with the increase of time the solutions obtained by LR converge gradually. Finally, the gaps approach small numbers. However, although CPLEXLP can provide gaps less than 10% within 1,500 seconds for all 9 cases , the gaps stay nearly stable within the remaining computation time; that is, providing more computational resources does not improve the solution qualities. In addition, GA and RCBS cannot provide a nearoptimal solution for RMA because of the algorithm structure. Therefore, LR is the best technique to solve reliable multiple allocation problems.
4.6. Incomplete Single Allocation Problems
For the ISA problems, the number of hub links also needs to be considered in addition to the number of hub . We evaluate the following combinations of and : , , similar to existing work on ISA problems [18]. The results are shown in the fifth column (e, j, o) of Figure 5. We only evaluate CPLEXLP, GA, and RCBS here, because LR is a nontrivial algorithm and has not yet been published for solving ISA problems. CPLEXLP still performs well in small networks and RCBS cannot guarantee the optimality. GA strategy proposed by us performs worse than CPLEXLP for . However, it can guarantee small gaps (≤3%) for all the instances while CPLEXLP and RCBS often provide gaps over 10%. The scale of variables of ISA problems is in (37)–(48). Thus, the main memory usage of these three methods for ISA problems all is much less than that for the other four problems.
4.7. Results for the AP Dataset and CAB Dataset
In addition to the TR dataset, the results for the AP dataset and CAB dataset are presented in Figures 7 and 8. The results corroborate our findings for the TR dataset proposed in Sections 4.2 to 4.6. CPLEXLP computes optimal solutions for CSA, CMA, and ISA problems in small instances because of their low complexity. GA still has the best performance, with short computation time and low solution gaps for solving CSA, RSA, and ISA problems, because the problem formulations can be neatly encoded with chromosomes of reasonable sizes. LR needs a long computation time for all problems with large network scales. However, similar to the results in Section 4.5, RMA with large networks scales cannot be solved by any of these methods within 3,600 seconds. From the observation in Figure 6, LR should be used to solve RMA problems. The computation time of RCBS is shorter than LP. However, RCBS sometimes provides poor solutions with high gaps because of the additional constraints in the method.
4.8. Visualization of Results
In the sections above, we analyzed the solution gaps, computation time, and main memory usage of four solution techniques for five types of hub location problems. In this section, the results of CSA and ISA problems for the three datasets are visualized in Figures 9 and 10. The number at the end of the caption of each subfigure is the relative difference between the solution and the minimum one.
(a) TR dataset with LP (108.55%)
(b) TR dataset with GA (0.67%)
(c) TR dataset with LR (0%)
(d) TR dataset with RCBS (1.49%)
(e) AP dataset with LP (186.43%)
(f) AP dataset with GA (0%)
(g) AP dataset with LR (2.45%)
(h) AP dataset with RCBS (2.73%)
(i) CAB dataset with LP (0%)
(j) CAB dataset with GA (0%)
(k) CAB dataset with LR (0%)
(l) CAB dataset with RCBS (5.29%)
(a) TR dataset with CPLEXLP (0%)
(b) TR dataset with GA (3.35%)
(c) TR dataset with RCBS (2.23%)
(d) AP dataset with CPLEXLP (0%)
(e) AP dataset with GA (0%)
(f) AP dataset with RCBS (3.65%)
(g) CAB dataset with CPLEXLP (0%)
(h) CAB dataset with GA (1.33%)
(i) CAB dataset with RCBS (4.37%)
In the first row (a–d) of Figure 9, results of CSA problems for the TR dataset with are proposed. CPLEXLP only provides a feasible solution in 3,600 seconds because of the large network scale. Thus, the assignment in Figure 9(a) is rather useless. The assignments of other three methods also have some difference, and their solutions are not far away from each other. The results of CSA problems for AP dataset with , , are shown in the second row (e–h) of Figure 9. CPLEXLP and RCBS provide only feasible solutions with bad assignments. The results of GA and LR have a difference on one hub, but the difference between the values of their objective functions is only 2.45%. In the third row (i–l) of Figure 9, results of CSA problems for the CAB dataset with , , are shown. It indicates that the hub sets and node assignments obtained by CPLEXLP, GA, and LR are all optimal. However, the results of RCBS are different: Detroit (DTT) is selected as a hub because of its high node importance (obtained by (57)) instead of Dallas (DFW).
In the first row (a–c) of Figure 10, results of ISA problems for the TR dataset with , are presented. The structures of links obtained by three methods are similar to each other. Only a few of hubs are different between three figures. In the second row (d–f) of Figure 10, results for the AP dataset with , , are shown. The optimal solution is obtained with CPLEXLP and GA. RCBS selects node 35 and node 16 as hubs because of their high importance. Results for the CAB dataset with , , , with three methods are shown in the third row (g–i) of Figure 10. Here, the optimal solution is obtained with CPLEXLP. Although the assignment provided by GA looks quite different from that by CPLEXLP, their solutions only have a difference of 1.33%. Detroit (DTT) is still selected as a hub instead of Dallas (DFW) by RCBS.
4.9. Analysis of Parallel Computation
The results from Sections 4.2 to 4.7 are obtained using a single thread only. However, the number of threads can be modified easily for CPLEXLP and RCBS. In this section, we analyze the performance of these two solution techniques with different numbers of threads.
As shown in Figure 11, two classical problems with , , , two reliable problems with , , , and the ISA problem with , , , for the CAB dataset are used as case studies. CPLEXLP and RCBS provide good solutions, with gaps less than 0.1% within 3,600 seconds for all five problems. Thus, we consider the computation time in this section. It can be seen that in the RMA problem (Figure 11(d)) the computation time of two solution techniques can be reduced significantly with an increasing number of threads. For the ISA problem (Figure 11(e)), the computation time can be reduced by adding the threads when the number of threads is less than a fixed number (here, the fixed number is 8). However, in the other three figures, there is no significant correlation between computation time and the number of threads. For future work, the parallel computation of GA and LR should be implemented. The impact of the number of threads will also be analyzed.
(a) CSA
(b) CMA
(c) RSA
(d) RMA
(e) ISA
5. Conclusions
In this paper, we reviewed, implemented, and compared several methods (genetic algorithm, Lagrangian relaxation, clusteringbased method, and CPLEX) for solving five types of hub location problems (CSA, CMA, RSA, RMA, and ISA). Because most methods were proposed for solving one or two types of hub location problems, we made appropriate modifications for these methods to solve all kinds of hub location problems. In order to evaluate the scalability and solution quality of these methods under the same computational environment, three standard datasets (TR, AP, and CAB) were used as case studies. In the evaluation, gaps of the objective functions, computation time, and main memory usage were used to compare the performance of these methods. There are seven major observations obtained from our evaluation:(1)CPLEXLP performs best on solving two classical (CSA, CMA) problems and incomplete single allocation (ISA) problems with small size because of the low problem complexity.(2)GA provides good solutions within short time for solving three single allocation (CSA, RSA, and ISA) problems, because small chromosomes are sufficient for modeling single allocation problems. However, it might be a poor choice for solving multiple allocation (CMA and RMA) problems, because the size of the chromosomes is in (here is the number of nodes).(3)LR should be used to solve reliable multiple allocation (RMA) problems, since other methods perform poorly for solving these problems, because the computational complexity of RMA and CMA is similar for LR. However, while using other methods, the complexity of RMA is much higher than CMA.(4)RCBS can be used to obtain a solution within a shorter time than CPLEXLP. However, because there are additional constraints in the algorithm, the optimality of the solution cannot be guaranteed with RCBS. Although these constraints can reduce the computation time, the problems may be changed already. Because of the constraints of potential hub set, some nodes with high importance are more likely to be selected as hubs no matter whether they are the optimal hubs, such as Detroit (DTT) in Figures 9(l) and 10(i).(5)CPLEXLP, GA, and RCBS use more main memory for larger networks, whereas the memory usage of LR remains almost stable for different numbers of nodes. CPLEXLP and RCBS need much more memory than GA and LR.(6)The structures of solutions obtained with several methods are sometimes completely different, yet, the values of their objective functions usually are not so far away from each other. For a given problem, more computation time is needed to obtain good solutions for large numbers of hubs with CPLEXLP, GA, and RCBS.(7)For reliable multiple allocation (RMA) problems, the computation time of CPLEXLP and RCBS can be reduced significantly with an increasing number of threads. For incomplete single allocation (ISA) problems, the computation time of CPLEXLP can be reduced by adding the threads when the number of threads is less than a fixed number. However, there is no significant correlation between computation time and the number of threads for other three types of problems.
Finally, note that several other types of hub location problems, such as the uncapacitated hub location problem and the allocation hub median problem [66], are not evaluated in this paper. For further work, these different problems and solution techniques should be considered. Hub location problems can be applied to other network design problems, multimodal systems, or multiple airport systems [67–70], and this can also be studied for future work. In this paper, we used CPLEX directly; future work could implement more sophisticated methods, such as Benders decomposition. In addition, GA performs well in single allocation problems in the current study. However, the computation time and the usage of main memory increase quickly with the growth of the network. Therefore, the combination of GA and CBS can be considered, that is, operating the GA evolution in the initial population obtained by CBS. In addition, the compression for representation of allocation arrays can reduce the usage of main memory. The parallel computation of several independent subpopulations in multiple threads may also speed up the algorithms.
Abbreviation Summary of Methods
AP:  Australia Post 
BB:  Branchandbound 
BP:  Branchandprice 
(R)CBS:  (Restricted) clusteringbased potential hub set 
EBD:  Enhanced Benders decomposition 
GA:  Genetic algorithm 
LAA:  Locationallocation algorithm 
LP:  Linear program 
LR:  Lagrangian relaxation 
PRM:  Path relinking method 
RCA:  Relaxandcut algorithm 
SAA:  Simulated annealing algorithm 
TS:  Tabu search 
WA:  Weiszfeld algorithm. 
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This study is supported by the research fund from National Natural Science Foundation of China (Grant nos. 61601013, 61650110516, 61521091, and 91538204).