#### Abstract

This paper investigates joint decisions on airline network design and capacity allocation by integrating an uncapacitated single allocation p-hub median location problem into a revenue management problem. For the situation in which uncertain demand can be captured by a finite set of scenarios, we extend this integrated problem with average profit maximization to a combined average-case and worst-case analysis of this integration. We formulate this problem as a two-stage stochastic programming framework to maximize the profit, including the cost of installing the hubs and a weighted sum of average and worst case transportation cost and the revenue from tickets over all scenarios. This model can give flexible decisions by putting the emphasis on the importance of average and worst case profits. To solve this problem, a genetic algorithm is applied. Computational results demonstrate the outperformance of the proposed formulation.

#### 1. Introduction

Since the airline deregulation act enacted in 1978, airlines introduce the concept of revenue management, restructure airline network and develop centralized airline control centers to establish and sustain a competitive edge in this market-driven environment [1]. Hence, hub location and revenue management play significant roles in boosting the development of the airline industry. New challenges remain in these two disciplines.

Hub location problem by locating the hubs and connecting fewer links from the hubs to the non-hubs makes an airline network obtain more traffic flows and largely reduce the cost. In the area of revenue management, the flows can be performed as air tickets which can be effectively allocated to different segments of customers to obtain more revenue. Hence, these two disciplines are closely related. Until now, only two publications [2, 3] have considered about integrating the hub location into the revenue management to maximize the airline profit on the one hand. These studies consider the average profit, which is difficult to identify the underlying distribution probability. On the other hand, our research is also inspired by a weight-based combined consideration of the average-case and worst-case values in [4]. They provide a flexible decision for an emergency response network design problem by putting relative emphasis on average-case and worst-case cost. Consequently, our aim is to maximize the profit of integrating a hub location problem into a revenue management problem and to provide less conservative and restrictive solutions by a weight-based combination of average-case and worst-case profits.

The main contribution of our research is developing a new mathematical model and proposing flexible decisions for an integrated problem of hub location and revenue management. In this paper, we present a weight-based two-stage stochastic programming framework formulation by combining average-case and worst-case profits. To reduce the computation time, a genetic algorithm (GA) has been applied. The results show that our formulation outperforms the other one only considering the average case. Accordingly, the effects of the weight value, the network configuration and the discount factor on the profit are discussed.

The organization of this paper is as follows: Section 2 presents a brief literature review. Section 3 describes an integrated problem of hub location and revenue management. Section 4 details the solution methodology. Section 5 discusses the results. Section 6 summarizes our conclusions.

#### 2. Literature Review

Since [5] introduces the concept of revenue management, the research on revenue management has received tremendous attention in both academia and industry. The main problems in revenue management can be categorized: pricing, capacity control, overbooking, auctions and forecasting [6]. This paper focuses on capacity control problem by allocating the capacities on all itineraries of an airline network to different booking classes segmented by customer demand over a fixed period from departure to maximize the airline revenue. Early publications provide an effective analysis of revenue management on single flight leg without consideration of network effects ([5, 7]). Progress in this area has moved forward to network revenue management problem which is proposed by [8]. The main difference between these two problems is that dramatic expansion of the airline network makes the itineraries largely increase. The research about network topology in network revenue management can be classified by: on the one hand, for multiple legs without a specific network structure, some research ([9–14]) decomposes the multi-leg problem into many single leg problems and then makes a decision on allocating the capacities to every single leg. The research on the hub-and-spoke network also has received significant attention on the other hand. [15] considers the network revenue management in a hub-and-spoke network. In their assumptions, a single hub can serve spokes. [16] studies the capacity allocation in a two-airline alliance under competitions in a hub-and-spoke network. [17] studies the airline revenue management with consideration of a competitor’s behavior under simultaneous price and capacity competition in a hub-and-spoke network. [18] explores the impact of structural properties on a hub-to-hub network revenue management problem. As seen from the above literature, the research of airline revenue is closely relevant to airline network structure. Until now, only two papers ([2, 3]) focus on integrated research of hub location and revenue management. Our paper continues to consider this integrated problem. For reviews on revenue management, we refer the readers to e.g., [19, 20].

The hub is used to largely reduce the links between the origins and the destinations in an airline network [21]. Hub location problem is a discipline which focuses on locating the hubs from a set of nodes and routing the links from the hubs to the non-hubs. Since [22, 23] contribute the first mathematical formulation and the solution method to the hub location problem, this research has received constant attention. Four fundamental hub location problems are [24]: (1) -hub median, (2) -hub center, (3) uncapacitated or capacitated -hub location, and (4) hub covering. Diverse network topologies also play a role in the development of these four problems, e.g. tree network, star-star network, cycle network and hub line network. This paper focuses on a star-star -hub location problem. [25] studies a single allocation -hub median problem in a star-star network structure. [26] minimizes the routing cost between the hubs and the non-hubs in a star-star network. [27] formulates two separated problems on a star-star network. One is -hub center problem, which is to minimize the maximum length of the paths between different nodes. The other is -hub median problem with bounded path length to minimize the cost. Then they analyze the performances of these two problems. [28] explores the hardness of a star -hub center problem. For poor service quality incurred by minimizing the routing cost, [29] proposes -star single-allocation -hub center problem. A min-max criterion is introduced by maximizing the service quality and minimizing the cost. These above research with a cost-based objective assume that every origin or destination node should be served. However, if some origin & destination (O&D) pairs are not profitable, the airline isn’t necessary to operate them. This assumption is relaxed by [30]. They measure the tradeoff between the profits of the commodities served on some O&D pairs and the cost of hub location and routing between the hubs and the non-hubs. Our paper also considers the hub location problem from the perspective of optimizing the profit. The overviews on the hub location problem ([21, 31]) provide a detailed treatment.

This paper is based on a very recent work [3]. They integrate an uncapacitated single allocation -hub median problem and a revenue management problem in a star-star network. A two-stage stochastic programming formulation is presented to maximize the profit. In the first stage, the hub location, the link between the hub and the non-hub and the protection level of tickets for different booking classes are determined. The booking limit of tickets can be obtained in the second stage. The demand is captured in a set of discrete scenarios under the average case. The average-case analysis is effective when a probability distribution is known. However, demand information is always hard to obtain which makes the results more sensitive and less accurate. To address this concern, robust optimization is proposed as an alternative method to handle data uncertainty. Instead of considering distributional information, robust optimization assumes a defined set of values that any uncertainty can be realized. The solution is obtained under the realization of most unfavorable uncertainty in this set. That is to say, robust optimization analyzes the worst-case value. But choosing an uncertain parameter set makes the result over-conservative. For more information regarding the average-case and worst-case analysis, the readers are referred to [32, 33]. To overcome these shortcomings, the research on the complementary effects of worst-case and average-case analysis is appeared ([34–38]). In addition, there is relatively sparse research regarding the average-case or worst-case analysis for a two-stage optimization problem. [39] explores the performance between the worst-case cost of two-stage robust optimization and the expected cost of two-stage stochastic programming. [40] discusses the performance between a two-stage scenario-based stochastic optimization model and a two-stage robust optimization formulation in response management for residential appliances under real-time price-based demand. The above research still explores the pros and the cons between these two methods rather than providing a more flexible solution. The recent advance in the research is proposed by [4]. This paper formulates a two-stage stochastic programming framework for an emergency response network design problem to minimize a weighted sum of average and worst case costs. This contribution can give the flexibility of network configuration by putting relative emphasis on the average-case and the worst-case costs. In our paper, this research is introduced into an integrated problem of hub location and revenue management.

#### 3. Problem Description

We consider an integrated problem of an uncapacitated single allocation hub median location problem and a network revenue management problem. In this star-star network, a central hub 0 is predefined. The hubs must be connected to the central hub 0. The number of hub is and these hubs are chosen from a set of nodes . The non-hub refers to the node that hasn’t been chosen as a hub. Consequently, two stars are formed: (1) the links between the hubs and the non-hubs, and (2) the other links between the hubs and the central hub. Here, the O&D pairs in this network can traverse at least one hub and at most two hubs (not including the central hub). That is to say, all these O&D pairs can be made of these two basic links: (1) is the link between non-hub and hub ; (2) is used for connecting hub to central hub 0. Figure 1 illustrates the underlying star-star network structure in this paper.

We have the following assumptions for our problem:(i)The location of the central hub 0 is predefined.(ii)No direct link is allowed between the two hubs.(iii)Each non-hub can be allocated to only one hub and direct link between the two non-hubs is not allowed.(iv)Only the aircraft type with capacity can be available on link. link allows the aircraft type with capacity .(v)The capacities on and links are considered. The capacity is relevant to the number of flights passing over this link.(vi)The ticket price at each fare class is given.(vii)The booking limit of tickets for the fare class is related to uncertain demand and the protection level of tickets. Once the protection level is set, upgrading to a higher fare class or changing to a later or earlier flight shouldn’t be allowed. Demand for different types of fares should be independent. The arrival order of demand is not considered.(viii)Cancellations and no-shows are not considered.(ix)Cost discount factor reflects the economies of scale on unit cost on link.

##### 3.1. Notations

The definitions of parameters and decision variables are summarized as follows:

*Indices* , : indices for nodes. : index for hubs. : index for scenarios. : index for fare classes.

*Parameters* : set of nodes in the network. : the number of hubs located in the network. : set of scenarios captured by uncertain demand. : set of fare classes. : weight of the average-case value over all scenarios , where , . : weight of the worst-case value over all scenarios , where , . : distance from non-hub to hub , for and . : distance from hub to central hub 0, for . : transfer cost (per unit distance in a flight) on link at fare class , where . : transfer cost (per unit distance in a flight) on link at fare class , where . In addition, , is a discount factor to describe the effect of economies of scale, . : demand on O&D pair between origin node and destination node at fare class under scenario , where , , and . : the probability of scenario under the average case on O&D pair between origin and destination at fare class , where , , and . : ticket price on O&D pair between origin and destination at fare class , where , and . : capacity on link. That is, the number of flights can pass over link, where and . : capacity on link. That is, the number of flights can pass over link, where . : the transport capacity of a specific aircraft on link is . : an aircraft’s capacity on link is limited to a certain amount referring to passengers volume. : fixed cost for installing a hub , for . : fixed cost for installing a central hub 0. : a very large integer.

*Variables* : booking limit of tickets at each fare class on O&D pair between origin and destination under scenario . : protection level of tickets at each fare class on O&D pair between origin and destination . : binary decision variable. equals to 1 if node is a hub node and 0 otherwise. : binary decision variable. If non-hub is routed to hub , equals to 1. Otherwise, .

##### 3.2. A Weight-Based Two-Stage Model

This paper proposes a two-stage formulation for the integrated problem of hub location and revenue management. The first stage decision is the location of the hub, routing the links and the protection level of airline tickets for different fare classes. In the second stage, the booking limit of tickets is employed in response to the arrivals of customers orders. Uncertain demand is captured by scenarios. In the average case, the probability under every scenario is . and represent the weights of the average-case value and the worst-case value, respectively, where and .

A model with combining the average-case and worst-case values via a pair of weights based on two-stage stochastic programming is presented. This model is as follows:The objective function (1) is to maximize a weighted sum of average and worst case profits. The first term represents the average of: (a) the revenue from tickets sales for different fare classes on O&D pair between origin and destination , (b) the transportation cost on link, and (c) the transportation cost on link, with a weight . The second term represents the minimum profit over all scenarios with an associated weight . The profits include the revenue on O&D pairs, and the transportation costs on and links. The last two terms represent the installation cost of the hubs and the central hub. Here the profit can be determined by putting relative emphasis on the values of weights and . Constraint (2) indicates that each non-hub must be allocated to one hub. Constraint (3) defines the number of the hubs in this star-star network. Constraint (4) assures that node can be allocated to hub only when is selected as a hub. That is to say, all flows are sent and received via the hubs to avoid direct connections among non-hubs. Constraint (5) and Constraint (6) specify that the booking limit of the tickets for each fare class is not over the demand and the protection level. Constraint (7) and Constraint (8) limit the flow on link and link, respectively. In our problem, the flow can be interpreted as the number of the flights. and indicate the number of the flights on link used by O&D pairs and . The number of the flights can be described as and on link. Constraint (7) indicates that the protection level of the flights on link used by O&D pair and can’t overpass its fixed capacity . A large-valued integer on the right-hand side of Constraint (7) is used for ensuring this inequality meaningful even when and . Constraint (8) describes the capacity restriction of link. The protection level of the flights on link used by O&D pair and should be less than fixed capacity . In our network, if link is used by O&D pair and otherwise. For , and is the only case satisfying this equation. In this case, this O&D pair comprises link and link when the origin node and the destination node haven’t routed to the same hub . Constraint (9) and Constraint (10) identify that and are binary variables and and are integers.

We can further simplify objective (1) by introducing an auxiliary variable . Then new model will be

#### 4. Genetic Algorithms

When the network size increases, the computational time will largely increase until finding the best solution. In this paper, we introduce a GA, which is proposed by [3]. GA introduced by [41] is based on natural biological evolution and has been successfully applied to various combinatorial optimization problems in hub location problem ([42–52]). After generating a feasible solution of and based on proposed model in Section 3.2 by GA, the other descisions are obtained by using CPLEX. This algorithm avoids introducing more variables and constraints induced by the linearization process and effectively reduces the computation time.

The main components of implementing the GA are given as follows:

*(A) Initial Population Generation. *The genetic code of each individual in our problem is a -dimension matrix generated by a uniform distribution over , where is the number of the nodes. We name this matrix as Matrix . highest entries from the diagonal of Matrix need to be turned into 1 and the other entries on this diagonal can be rewritten as 0. The entries equalling 1 indicate the locations of the hubs, namely and the other entries 0 indicates otherwise, . Next, we replace the entries on the overlaps between the columns where the entries of this diagonal equal to 1 and the rows where the entries of this diagonal equal to 0. Comparing the overlaps in the same row, the highest entry needs to be rewritten as 1 which means . Beyond that, other entries in addition to the ones , and need to be rewritten as 0 which mean . In addition, if the overlaps in the same row have the same values, we choose the first highest one as . According to the above principles, we obtain a 0-1 matrix as Matrix . Here Matrix represents the network structure. The size of initial population stays the same from generation to generation.

*(B) Fitness Value. *The fitness value of each individual is obtained by calculating the reciprocal of the profit obtained from the model in Section 3.2.

*(C) Selection Operation. *In order to reproduce the offspring, a selection process aims to choose the parents who will mate. The parents can be chosen from the generation by random selection.

*(D) Crossover Operator. *The offspring can be generated by applying a crossover operator to the selected parents. The crossover procedure is as follows:where and are the offsprings. and represent the two parents. is a random -dimensional matrix chosen uniformly over the interval , where is the number of the nodes. The same is used in every combination.

*(E) Mutation Operator. *Premature convergence to suboptimal solutions may occur because some genetic codes are lost or unexplored from the above steps. To avoid this problem, mutation process is needed. Mutation can perform sporadic and random changes in offsprings’ genetic codes with a mutation probability. There are three mutation operators: Elimination, Transposition and Conversion. Only one of these three operators can be selected randomly in each iteration. The details of all the operators are as follows:

(a)* Elimination*. The hubs need to be relocated by subtracting the highest entry on the same diagonal of Matrix from 1.

(b)* Transposition*. The assignment between the non-hubs and the hubs can be recaptured by transposing Matrix .

(c)* Conversion*. The network structure can be redesigned. A -dimension matrix () is removed from the bottom-right corner of Matrix . This random integer is generated from the multiplication of the nodes and a random number drawn from a uniform distribution . A new -dimension matrix uniformly distributed between 0 and 1 can refill the empty space of Matrix .

After the above mentioned, new individual can be produced. The values and of each individual can be inputted into the simplified model in Section 3.2 as the parameters. We use CPLEX to obtain other values and . This process is iteratively repeated until a suitable stopping criterion can be met.

*(F) Stopping Criterion. *This hybrid algorithm can be terminated when any of the following conditions is satisfied:(i)The best solution hasn’t been changed after 10 iterations;(ii)The running time is over 8 hours.

#### 5. Computational Study

In this section, we perform extensive computational experiments to analyze the performance of the proposed mathematical model. The parameter settings are provided in Section 5.1 and the solutions are analyzed in Section 5.2.

##### 5.1. Test Bed

We use the data set from [3]. The values of are set at , , , , , , , , , , and . The cost of installing hub or a central hub is drawn from . Two fare classes () provide a business class and an economic class . For link, the transportation cost per unit distance per flight on the class is drawn from and the transportation cost per unit distance per flight for is , where . We consider = and , where . Since [3] does not provide any information regarding the distance, we set and . The capacity on link is drawn from a discrete uniform distribution . For link, the capacity is drawn from . The capacity of the aircraft used on link and on link are 100 and 200, respectively. The ticket price at is drawn from a continuous distribution . And the price at is higher than the one at . We set .

We set and the probability of all the scenarios is the same, namely . The demand for each fare class is drawn from the interval between low demand and high demand . of the demand come from the economic class and are from the business class. We also test six weight vectors , such that , , , , and . Note that the weight combination represents that the objective of our problem is to maximize the average profit over all scenarios. The weight stands for maximizing the worst-case profit. The mixed weight combinations , , and mean maximizing a weighted sum of average and worst case profits over all scenarios. In addition, the probability of crossover is and the probability of mutation is . The maximum number of generation is 50 and the population size is 20. We extend the number of scenarios to 10. When the number of scenarios is three times more than the number used in [3], our problem becomes more challenging if the population size is still set 50 as in [3]. Therefore, we reduce the population size to 20 so that this process can continue in limited iterative.

##### 5.2. Results

This hybrid algorithm was implemented in Matlab R2017b using Yalmip R20171121 and IBM ILOG Cplex Optimization Studio 12.8.0. All runs were done on a MacBook Pro with an Intel Core i7 CPU 3GHz processor at 8GB of RAM under macOS High Sierra 10.13.3 operating system. To demonstrate good performance of our mathematical formulation, we conducted 216 computational experiments considering 6 weight vectors with varying 12 network sizes as well as 3 discount factors.

The organization of this section is as follows: First, we report computational results under different parameter values and , weight vector and discount factor in Tables 1 and 2. Especially, the results under and are shown in Table 1 and the results under the mixed weight combinations are presented in Table 2. Second, Figure 2 is used for comparing the performance of our proposed model and the model in [3]. The impact of on the profits can also be illustrated in Figure 2. Finally, we present the results of the worst-case values in Table 3 and investigate the correlation between the worst-case value and the profit in Table 4.

We initially observe the profits under weight combinations and . The results obtained from solving the simplified model (11)-(13) and (2)-(10) are summarized in Table 1. The first three columns report the values of discount factor , the number of the nodes and the number of the hubs . The columns, under the heading of (), give the profits under the weight combinations and . The last column shows the deviation between these two values under and . This deviation has been computed as “profit under - profit under ”.

Observe from Table 1 that the profit differs a lot depending on different node number existed in the network. The highest profit is achieved at and the lowest profit is obtained at . The values in the column labeled ‘Dev’ indicate that the profit under is higher than the one under for a given node number in almost all cases. As can be seen, the deviation between profit and is insensitive to an increase in . We next observe the effect of hub number on the results presented in Table 1. The highest profit is produced when , whereas the lowest profit is obtained at . When increases, the deviation also increases in the majority of cases. For a given , the profit increases as increases in all cases. Besides, for a given , an increase in node number yields to a decrease in profit. The largest deviation always can be obtained at . We also test all instances with three different levels of discount. When the level of discount is increased from to , higher profit can be resulted in for most cases. In particular, the profit stands out significantly when , whereas only a few instances don’t justify this point. The profit under is higher than the one under along with an increasing discount.

Table 2 provides the solutions by solving our model optimally under mixed weight vectors. The discount factor, node number, and hub number are shown in the first three columns labeled as , and . The next four columns represent the profits under different weight vectors, , , and . To demonstrate the results with different levels of discount, Table 2 is split horizontally into three parts for , and .

Regarding the factor of node number , the highest profit can be provided within the instances with and the lowest profit always can be reported at . We observe that the profit is not proportional to the network size. When the network becomes larger over a perfect structure (e.g. it’s in our paper), the profit can’t be higher. In other words, not all the demand can be satisfied and some of the O&D pairs are unprofitable in a larger network with over 15 nodes. We also analyze the impact of hub number on profits. The minimum profit is at and the maximum profit is at . This is reasonable because the effect of the hub can not only largely reduce the links but also can bring economic benefits as a transportation and business center. Hence, more hubs can attract more business opportunities. Besides, the profit tends to be lower when the proportion of hub and node is less than 20%. The proportion of over 35% can result in higher profit. As can be seen, the profit changes significantly with discount factor . It is clear that profits increase along with increasing , for a given , and (, ). For a given , and , the profit reduces along with increasing and decreasing in almost all cases. For example, the maximum profit is at (0.2, 0.8) and the minimum is at (0.8, 0.2).

Figure 2 provides an insight into the change in profits under different parameter values. Each subgraph presented in Figure 2 refers to one specific network. For each network, we depict different profits under different discount factor . The x-axis lists the values of weight vectors (, ) and the y-axis shows the profit. To justify the priority of the proposed model, we analyze the computational results compared to [3]. The solutions in [3] correspond to the profits under weights (1, 0) in our experiments. As mentioned before, the profits under weight vector (0, 1) result in the best profit in most cases. In other words, the values in (0, 1) provide an upper bound for almost all the cases. The lowest profits are achieved when the weight combination is (1, 0) for each network size in the majority of cases. Besides, the profits under all weights (0.2, 0.8), (0.4, 0.6), (0.6, 0.4) and (0.8, 0.2) are higher than the ones under (1, 0) in most cases. We believe that these results provide a good indication of the added benefit of a combined average-case and worst-case analysis for the integrated problem. We also conclude from Figure 2 that when the discount factor is higher, the decision maker can significantly obtain more profit. Hence, the discount factor is an important indicator in designing and operating a hub network.

The value of is listed in Table 3. In addition to the columns in Table 2, the columns labeled ‘’ report the values of under six weight vectors. As noted in the problem definition, indicates the worst-case value over all the scenarios. We would like to note that the value of in column (1, 0) is 0 in any case because the experiment under (1, 0) aims to maximize the average profit. In other words, is absent from objective function (11). For a given , and , reduces along with increasing and decreasing . The highest worst-case values are obtained with the weight vector (0, 1) in the majority of the cases, whereas the lowest worst-case values are achieved with the weight vector (0.8, 0.2) in most cases. Similar conclusions from Table 3 can also be drawn as from Tables 1 and 2.

We want to explore the effect of the worst-case value on the profit in Table 4. The ratio is computed as “Worst-case value / Profit”. It is clear that the worst-case value is usually lower than the profit in the majority of cases. However, the ratio over 1 indicates that the worst-case value is higher than the profit as increasing network size for a given . In addition, the value of the ratio is insensitive to , for the given , and (, ). Under different weight combinations, the ratio is usually the same for the same network size.

#### 6. Conclusions

This paper focused on the integrated problem of an uncapacitated single allocation -hub median location problem and a revenue management problem. We optimized the decisions about hub locations, routing links and the protection levels of airline tickets for multiple fare classes in the first stage and the booking limits of tickets in the second stage. We formulated a two-stage stochastic programming framework and considered a weighted sum of average and worst case profits. The proposed model could give the flexibilities of the decisions on network configuration and tickets sales by putting the emphasis on the importance of the average and worst case values over all scenarios.

In addition, a GA was applied to solve our problem. We performed 216 computational experiments, including 6 weight vectors with varying 12 network sizes as well as 3 discount factors under 10 scenarios. We analyzed the profits under different weight vectors, the performance compared to the consideration in [3] and the effect of the worst-case on profits. The results demonstrated that the proposed formulation outperformed the model in [3]. The largest profits were obtained at weight combination (0, 1), whereas the lowest profits were achieved at the weight combination (1, 0). The profit decreased along with increasing and decreasing . Moreover, the discount factor was also an important indicator in designing and operating a hub network.

Future studies could be enriched to considering capacitated hubs in this integrated problem. For shortening the computation time, more efficient meta-heuristic algorithms could be developed. Using the robust controller to deal with the uncertainty could be a better solution, e.g. [53].

#### Data Availability

The data used to support the findings of this study are included within the article.

#### Conflicts of Interest

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

This research was supported by the National Natural Science Foundation of China (71371140).