#### Abstract

China Railway Express is developing rapidly, but the point-to-point direct organization mode has brought many problems to it. Therefore, this paper proposes to construct a hub-and-spoke network and adopt the “collecting and transportation” organization mode. In this paper, based on the single distribution p-hub median problem, a dual-objective planning model was constructed by considering the characteristics of CR Express in terms of cost and time. In addition, considering the port as a key node of international rail transport network, it plays a vital role for CR Express. Therefore, a hub-port allocation model was constructed to determine the hub-port allocation relationship. Furthermore, a Lagrangian relaxation heuristic algorithm was designed to solve the model built for CR Express transportation network. Finally, based on the constructed model, the actual operation data of CR Express were used in the designed example to verify the effectiveness and applicability of the models and methods.

#### 1. Introduction

The CR Express is organized by China National Railway Group Corporation Ltd and operates according to the rule of fixing trains, routes, schedules, and full-time operation times. The international railway inter-modal container train is an important carrier for deepening economic and trade cooperation between China and the countries involved in transit and an important method for promoting the construction of the “Belt and Road.” In terms of international corridors, CR Express paves three routes situated in east, middle, and west (see Figure 1).

The brand of CR Express has been promoted rapidly and the accessibility of its transport routes is constantly expanding. Furthermore, the category of goods has been gradually enriched and the quality of CDB has been greatly improved (see Figure 2). As of February 2020, the cumulative number of trains engaged for CR Express has exceeded 15,000. The number of CR Express lines has reached to 68, basically achieving a two-way transportation balance.

The CR Express is organized basically as a point-to-point direct mode. This mode means that each train meets the fixed grouping requirements in the originating city and does not carry out transfer and assembly operations in the moving path until it reaches the destination city. The point-to-point direct organization model causes problems such as disordered competition for collecting goods. In contrast, the hub-and-spoke mode is more reasonable to substitute it.

The research of hub-and-spoke network mainly focuses on the network construction, network optimization, and algorithm.

O’Kelly [1, 2] first proposed the concept of a hub-and-spoke network which means it can concentrate the traffic volume to some hubs from other areas through the “spoke” routes. In the classic model, to reflect the utility of economies of scale, it is assumed that a fixed percentage discount is used for transportation costs between hubs. Campbell [3, 4] classified this model in four types of discrete hub location problems, including the p-hub median problem which is studied in this paper and the discrete hub center and hub covering problems. In order to avoid the failure of hubs and reactive disruption management, Yu et al. [5] designed a set of reliable hub-and-spoke network models by involving the selection of backup hubs and alternative routes.

The defects of the classic hub-and-spoke network, such as the congestion problem caused by the aggregation of traffic and the refined expression of the economies of scale between hubs, appeared after several studies. O’Kelly and Bryan [6] believed that the scale effect caused by the aggregation of traffic between hubs led to a reduction in cost. The original assumption of a fixed transportation cost will not only erroneously calculate the overall cost of the network but may also lead to deviations in hub location and node allocation. It designed a convex cost function to represent the scale effect of traffic aggregation and developed the FLOWLOC model which has the following characteristics: multiple allocation; continuous cost function; objective function linearized; and cost function based on unit distance. It also proved that the optimal solution of the FLOWLOC model is not a user equilibrium solution. Marianov and Serra [7] proposed a hub location and path allocation model considering the congestion effect. The hub is regarded as an M/D/c queuing system, and the probability formula of the system customer is used as capacity constraint. In addition, due to the large computational complexity of the model, it used a tabu search algorithm to solve it. The favor of congestion for hub-and-spoke networks was also commented by Elhedhli and Wu [8]. The problem was formulated as a nonlinear mixed-integer program (NMIP) that explicitly minimizes congestion, capacity acquisition, and transportation costs. And it viewed the hub-and-spoke system as a network of *M*/*M*/1 queues.

Although the model is originally used in the air freight context, Yaman et al. [9] gave an example for studying the inland freight transport system. They proposed the assumption that vehicles could be stopped along the route to its destination for some handling, sorting, and other operations. And a minimax model was constructed to calculate the arriving time of the last vehicle and aimed to minimize it. For the modelling details, some inequations were used to intensify the constraints by considering that the time costed in handling, sorting, and other operations could be the most part of the total transport time needed. Jeong et al. [10] focused on the goods transport by rail. They tested the viability of several railway networks containing various numbers and locations of potential hubs. A central planer was used to find transport routes, frequency of service, length of trains, and transportation volume. Huang et al. [11] studied the reconfiguration of bus services in an urban area with a newly constructed rail system by considering the bus stops as non-hub nodes and selecting hubs from rail stations.

The traditional hub-and-spoke network research basically aims at the minimum cost when constructing the model. Costa et al. [12] adopted a dual-objective planning model which involves the transportation cost and time when studying the single-allocation hub-and-spoke network. And it aimed to minimize the total transport time of the network and the total service time of hubs. de Camargo et al. [13] studied the design of multi-allocation hub-and-spoke networks in the case of hub congestion. For the congestion problem, it was treated as a convex cost function. Kian and Kargar [14] studied the hub location problem by defining a congestion cost function in power law distribution and proposed a reinforcement method using effective inequalities based on perspective cuts in mixed-integer nonlinear programming. da Costa Fontes and Gonclaves [15] proposed a hub-and-spoke network that combines short alternative paths and scale economy effect to reduce the cost of demand routes in liner transportation.

As for solving the hub-and-spoke model, several methods are usually used for this type of optimization problem. Yu et al. [5] developed the Lagrangian relaxation and branch-and-bound methods to efficiently obtain optimal solutions. Chen et al. [16] analysed characteristics of the optimal solution and designed the Lagrange relaxation algorithm to solve the HALPUD mathematic model. Yang et al. [17] established a mixed-integer nonlinear programming model. Luo and Zhang [18] studied the single-allocation capacity-free network model from the perspective of graph theory and designed a heuristic algorithm to solve the minimum sharing tree problem based on parametric algorithm.

In the present article, we pay more attention to the overall transport network, the embodiment of economies of scale, the consideration of congestion problems, port problems, and others. The concept and the research on many aspects of hub-and-spoke network are relatively mature. There are abundant research studies on scale effect and congestion problems, which have a good reference significance in the study of hub-and-spoke network of China-Europe freight train.

#### 2. Materials and Methods

##### 2.1. Hub-and-Spoke Model of CR Express Transportation Network

The form of CR Express transportation network is different from that of general hub-and-spoke network. As an international railway inter-modal transportation service, it must pass through different border ports at the boundary, so that the hub-and-spoke network of CR Express is more complicated at the node level. Therefore, this paper aims to construct the transportation network by two stages: hub-and-spoke network construction and hub-and-port allocation. In general, the CR Express transportation network can be summarized as a “three-point and three-line” structure (see Figure 3). Therefore, the construction of CR Express transportation network needs to consider two issues: the basic structure of hub-and-spoke model and the allocation of hub and ports. And the last one can be divided into three missions as hub site selection, the distribution relationship between non-hub and hub, and that between hub and port.

For the problem of hub and port allocation, the output of hub flow and flow direction from the hub-and-spoke model is used as input to build a dual-objective hub and port decision model with transport costs and transport time as the objectives.

###### 2.1.1. Model Assumptions

(1)Do not consider the impact of changes in international political relations, wars, public health events, and joint competition on hub selection and node distribution.(2)The nodes in the network are smooth with each other, forming a fully connected network diagram.(3)When a node is connected to other nodes, the shortest path is used for goods transportation without other path considered.(4)The non-hub nodes in the network are not interconnected, that is, the traffic between non-hub nodes must transit through at least one hub point.(5)The queuing of network flow at the hub node conforms to the *M*/*M*/1 queuing model, that is, the arrival process follows the Poisson distribution, and the hub node service process follows the negative exponential distribution.(6)Assuming that the channel line between ports is fixed, at this time, the channel containing two ports is regarded as a node with time and cost parameters, that is, the “three in two” and “two in three” problems are not considered.(7)The research object is the transportation network constructed, considering the flow of goods in a long period, so it does not consider the impact of the schedule and the number of groups participating in the transportation.

###### 2.1.2. Variable Descriptions

All variables used for modelling are defined and listed in different tables after several functions or model formulas.

*Model Building*. The p-hub median model in condition of fixed discount coefficient without capacity constraints is a classic site selection model. The problem is to select *p* nodes as hubs with the goal of minimizing transportation costs. For the model, it can be described as shown in equations (1) to (7). The variables related to the model are shown in Table 1.s. t.

The model aims at minimizing the total cost of the transportation network and determines the hub node sets and the distribution relationship between non-hub nodes and hub nodes. The scale economy effect is represented by the discount factor for transportation costs between hubs. At the same time, the discount factor is applicable to any connection between any hub pair, regardless of the size of traffic flow.

The classic hub-and-spoke network model (linear model) measures the scale economy effect with a fixed discount factor for calculating transportation costs between hubs and does not consider the impact of flow volume between hubs, which is inconsistent with reality. In order to more scientifically reflect the changing trend of transportation costs relative to the traffic flow, a nonlinear convex function is necessary to be designed to express the relationship between cost and flow (shown as a nonlinear model). This convex function can realize that the transportation cost between hubs increases with the increase of the flow between hubs, but its growth rate decreases. Thus, the following cost function (equation (8)) is introduced to represent the transportation cost between hubs in our model, and the related variables are shown in Table 2.

In the nonlinear model, the growth rate of the transportation cost between hubs decreases with the increase of the flow between the hubs. That means the first derivative of the transportation cost between the hubs to the flow is greater than zero, and the second derivative is less than zero. The use of this nonlinear convex function (equation (9)) to represent transportation costs between hubs is more logical than the other one in linear.

Although the nonlinear cost function is more reliable to the actual situation, it also increases the difficulty of problem solving. For this fact, the nonlinear part needs to be linearized. Nonlinear cost functions between hubs can be linearized by piecewise linear functions. As shown in Figure 4, a set of linear functions is used to approximate the nonlinear cost function between hubs. As the flow between hubs increases, the piecewise linear function alternates to approximate a nonlinear function, generating an intersection point at each alternation, passing a new threshold at the intersection point, and generating a linear function with a smaller slope, which represents the acceleration of cost slowdown (see Figure 4). The piecewise linear functions have advantages to not only represent nonlinear functions but also use linear methods to solve problems. The meaning of characters in Figure 4 is shown in Table 3.

According to the fixed-point formula, the convex piecewise linear function can be expressed by the second kind of special ordered set (SOS2). The model expressed by SOS2 is as equations (10)–(17). Several first occurred variables are illustrated in Table 4.s. t. (2)∼(7):

Constraint equation (11) means that all transportation demands between each OD point pair are met, constraint equation (12) determines the flow between hub *k* and *m*, and constraint equations (13)–(17) are used for SOS2 linearization.

The processing capacity on the hub node is limited, but in reality, traffic exceeding the capacity limit will not be embargoed. Most cases are queued at the hub node, so the congestion is involved in the total transportation time in our target. So, in the first stage, the transportation time is treated as a soft constraint.

The total transportation time of CR Express network mainly includes two parts: transportation time and congestion time. Transportation time can be divided into three subparts: in transit time along the route, fixed processing time at hub, and hub assembly time. Among them, the congestion time is the queuing waiting time caused by exceeding the capacity of the hub.

The transportation time formula is shown in equation (18), and the variables used here are listed in Table 5.

The service of CR Express has already existed congestions at some nodes. In the foreseeable future, congestion will inevitably prevail if without any other improvement. On the one hand, with the further development of CR Express, the traffic volume will be further larger, and on the other hand, the use of transport organization model called “combination of trunk and branch, collecting and distribution” will also aggravate the container accumulation at hub nodes. Considering that CR Express generally chooses to wait instead of replanning the path when it meets a congestion at nodes, it is more realistic to involve the congestion time in the goal of modelling. For treating the congestion as a target, it is more reasonable to consider the congestion time as part of the total transportation time. So, the *M*/*M*/1 queuing model was used, that is, it is assumed that the arrival process follows Poisson distribution, and the service process at hub nodes follows exponential distribution. The congestion at hub nodes is regarded as the ratio of total flow volume to the remaining capacity; then, the congestion time at the hub can be noted as follows:

Therefore, the average congestion of the entire network is expressed as follows:where can be expressed as .Therefore, the total objective transportation time considering the congestion effect can be expressed as equation (21) with all first occurred variables’ description in Table 6.

##### 2.2. Lagrangian Relaxation Heuristic Algorithm

The variables in type “0-1” for CR Express hub-and-spoke network model have reached , and the model also contains SOS2 variables and constraints. So, it is a very complex nonlinear mixed-integer programming model. In the case where one instance contains fewer nodes and the problem scale is not large, the Lagrangian relaxation algorithm can be used.

*Construct and Solve the Lagrangian Relaxation Problem*. Constraint equations (2), (5), and (6) are the main constraints of the model built, so all of them should be relaxed by using Lagrange multiplier , , () as designed. After that, we can get the Lagrangian relaxation model as shown in equation (22). All relevant Lagrangian variables are listed in Table 7.s. t. (3)∼(4), (11)∼(17).

According to the nature of decision variables, such as noted for the distribution of non-hub nodes to hub nodes and for path allocation, the Lagrangian relaxation model can be further decomposed into two problems.

###### 2.2.1. SUB-1

s. t.

###### 2.2.2. SUB-2

s. t. (2)∼(6), (11), and (21).

Without considering constraint equation (15), SUB-1 can be decomposed into *k* smaller-scale problems, one for each hub selected.

###### 2.2.3. SUB-1-*k*

s. t.

It can be seen from the objective function form of SUB-1-*k* problem that it is a nonlinear problem. To solve SUB-1-*k* problem, the objective function needs to be linearized. In this paper, variable substitution and piecewise linear method are used to divide the nonlinear part into linearized by introducing the form: .

According to the above formula, it has the relation as . Therefore, letting, , then we can get the formula as . Because the *s _{k}* is a convex function which takes an independent variable noted as

*R*, the

_{k}*s*can be approximated by a set of tangents. Its schematic diagram can be found in Figure 5. Given a set of tangent points , it can be approximated by the following formula: .

_{k}This indicates that . Due to , the set of values of is a finite set, so the set H is also a finite set, so it has the formula as .

Thus, SUB-1-*k* model is now linearized as SUB-1-*k*(H):s. t.

Since the SUB-1-*k* model is a min-type problem, when the model obtains the optimal solution, there is at least one constraint equation (36) which meets the formula as

It should be noted that because , then SUB-1-*k*() is an approximation of SUB-1-*k*(H). The nonlinearity of the congestion function is eliminated by introducing constraint equation (34), and the model is transformed into a mixed-integer programming problem with continuous variables. This paper uses the cut plane method to solve the SUB-1-*k* model by iterating. In this method, starting from a given set of initial constraints, the remaining constraint equation (34) is added during the iteration process as needed. For any finite subset of SUB-1-*k*(H) constraints with , the SUB-1-*k*() is the relaxation of SUB-1-*k*(H).

Therefore, its optimal objective (denoted as v[SUB-1-*k*()]) produces a lower bound that is better than the optimal objective of SUB-1-*k*(H) model (denoted as v[SUB-1-*k*(H)]). That bound is also lower than SUB-1-k’s optimal goal which can be noted as v[SUB-1-*k*]. In addition, because the constraint set of SUB-1-*k* is a subset of that of SUB-1-*k*(), the optimal solution (denoted as ()) of SUB-1-*k*() is feasible for SUB-1-*k* (). Therefore,produces an upper bound for v[SUB-1-*k*].

Solving the SUB-1-*k* model needs to solve the problem of SUB-1-*k*() model which is given a set of initial approximation points (). Given a set of approximate points() for the *n*th iteration, the optimal goal of SUB-1-*k*() produces a lower bound of SUB-1-*k* which can be noted as , and when that solution is brought into the SUB-1-*k*, it can generate an upper bound noted as . The optimal upper bound is . If the gap between the upper and lower bounds is still relatively large and the convergence threshold is not reached, a new approximation point () is generated and the formula noted asis added to the SUB-1-*k*() model. At this time, the approximate point set is updated to , and SUB-1-*k*() is replaced by SUB-1-*k*(). It should then repeat the above process continuously until the gap between the upper and lower bounds converges to the threshold. At this time, finding the solution of SUB-1 is equivalent to selecting *p* nodes that minimize v[SUB-1-*k*]. of *p* hub nodes are retained; then let , and of non-hub nodes except the *p* hub nodes should be set to 0.

The SUB-2 model can be calculated using a solver, but due to the large boundary of Lagrange, the initial result of relaxation is poor. This situation can be solved by adding an effective cut to the subproblem. Since there are *p* hubs, the number of connections between hubs is , so an effective cut can be formed as .

In addition, since the connection between hubs is bidirectional, the following constraints can also be added: .

*Calculate the Feasible Solution of the Original Problem*. The solutions obtained by SUB-1 and SUB-2 may not satisfy the three constraints that are relaxed and may not be feasible for the original problem, that is, there may be cases where the node is not assigned to any hub, so it is necessary to construct a feasible solution of the original problem. According to the literature [19], for the solution obtained from SUB-1 and SUB-2, check whether each node *i* is assigned to the hub node; if it is assigned, keep the original value; if not, set it to 0, Then, select the *p* hubs with the lowest cost. For , whose value is equal to , the upper bound can be obtained by bringing and into the original model objective function. The specific construction method is as follows: Step 1: the hub assignment plan of node *i* If , then: (1.1) let; (1.2) find ; (1.3) set . Step 2: path allocation plan of node pair (*i*, *j*) 2.1) Set.

##### 2.3. Updation of Lagrange Multiplier

For a set of Lagrangian multipliers , the optimal solution of the Lagrangian relaxation problem gives the lower bound of that of the original problem, and that lower bound corresponds to the optimal multiplier of the dual Lagrangian problem. The optimal Lagrangian boundary is .

The secondary gradient descent method can be used to search , and the upper bound() can be obtained from the feasible solution of the original problem, and the lower bound() can be obtained from the Lagrangian relaxation problem of the original problem. The secondary gradient descent method adjusts the multiplier according to the upper and lower bounds. The specific methods are as follows:

The step size noted as *t* is determined by the following formula:

For a given step size factor , if the value of stays at the same value level after several iterations, then it should be halved.

#### 3. Results and Discussion

##### 3.1. Node Set of CR Express Network

In this paper (see Table 8), the main node cities of CR Express which maintain a normal operating capacity are selected as the research objects. The node sets are determined as {Chongqing (1), Zhengzhou (2), Wuhan (3), Chengdu (4), Hefei (5), Yiwu (6), Xi’an (7), Suzhou (8), Shenyang (9), Yingkou (10), Duisburg (11), Hamburg (12), Tilburg (13), Warsaw (14), Lodz (15), Malaszewicze (16), Minsk (17), Moscow (18), Madrid (19), Budapest (20)}. Their relative geographical location is shown in Figure 6.

##### 3.2. Parameter Determination

In this paper, the calculation instance only considers the basic data related to the railway transportation, including the container traffic volume, railway transportation mileage, transportation cost, transportation time, and other parameters between the city nodes. The container traffic data between the nodes in each city are basically based on the actual data of each train company, but because some train companies have not published relevant data, it is necessary to make reasonable assumptions about the missing data; railway transportation mileage data come from the map website; the transportation cost per kilometer per unit of container varies from country to country. According to the survey results, the cost of railways in the CIS countries is determined as 0.25 dollar/(km ∗ container), the cost of European railways is 1.978 dollar/(km ∗ container), and the cost of the domestic section line is 0.7 dollar/(km ∗ container). The unit container transportation time can be calculated according to the quotient of the distance between nodes and the average speed. According to the survey results, the average speed of the domestic railway section is 1400 km/day, the average speed of the original CIS country railway section is 1000 km/day, and the average speed of railway section in Europe is 800 km/day.

This paper selects three different functions and divides them into four segments to represent the discount function. The slope of each segment is shown in Table 9. This paper uses 4 breakpoints to handle the traffic between hubs based on the overall traffic flow of the CR Express network, which are {0, 200, 500, 1000}. The selection of these breakpoints ensures that the overall network traffic flow is always included in this range. At the same time, it ensures the selection of values. For Lagrangian relaxation, the initial multiplier , , is set to “−106, 106, 106” respectively, and the step factor is set to 1.5. When the gap between the upper and lower bounds () is less than 1% or the number of iterations is greater than 3000, the Lagrangian relaxation algorithm is terminated. In the initial stage for solving the SUB-1−k(H) model, 11 initial tangent points noted as are set, so that the initial approximation of the congestion function is lower than 0.01. The objective function scale factor is set as .

Values of are divided into three levels: high, medium, and low. The high level is related to the case of Chinese container station, and the value is determined to be 6000. While the medium level is related to the case of foreign nodes, the value is 4500. At last, the low level has a value of 3000 which is related to the case of Chinese noncontainer station. The average number of non-hubs to hub assembly groups *M* is 20, and the ratio of the average number of vehicles to the number of hub groups is 1/2 (see Table 10).

The value of is divided into three levels: high, medium, and low (see Table 11 for specific values).

##### 3.3. Calculation Results

At first, based on the dataset in the instance, the transportation cost and time index of the point-to-point direct transportation organization mode were calculated as follows in Table 12. The calculation results below were obtained by using the software of Matlab2014b and CPLEX12.6.3.

When the transportation cost between hubs was measured as a function (, , and ), the results of the hub-and-spoke model were as follows in Tables 13–15. The result of the calculation instance was expressed in the following form {hub point (backup hub point) |non-hub point assigned to the hub point}.

As for demonstrating the effectiveness of Lagrangian relaxation algorithm proposed above, variations of lower bound and upper bound of the algorithm were plotted for the last case using the transportation cost function (see Figure 7). And it showed that the iteration was conducted by 11 times for achieving the terminal condition.

This paper gives an instance of CR Express by using the more reliable data collected and repeats the Lagrangian relaxation heuristic algorithm by importing three different discount cost functions called , and for representing the scale economy effect. For each discount cost function , it can be recognized that the hub-and-spoke model can always be obtained, and for each hub point in the transport network, several non-hub points are assigned to it. But different hub points are selected for different cases involving different discount cost functions, and there exists the gap of transport cost and time among these three cases.

By identifying the three results for each discount cost function , it can be recognized that whether the discount cost function adopts a lower slope or a higher slope has no effect on the location and route allocation of some hubs, and “Chongqing, Zhengzhou, and Marashevich” are usually selected as the hub nodes. Thus, these city nodes are more important than the others when operating the CR Express freight trains.

When it involves a more moderate pricing strategy like choosing the discount function , the result that most traffic flow concentrates to a certain hub pair will occur, which causes more congestion. The reason is when the discount coefficient of discount cost function decreases, the transportation network will gradually weaken the influence of the transportation cost per kilometer of unit container and the distance between nodes, thus leading to the network traffic to concentrate on some hubs in a wider range and prolonging transportation time.

These three discount cost functions () used above have low, moderate, and high discount rates separately, which can correspond to the different development periods of CR Express freight transportation. It can be seen from the calculation results that there is a situation that a single hub has no affiliated non-hub nodes, indicating that fewer hub nodes can be selected when the transport scale is small in the early stage for CR Express.

Furthermore, as for the number of hub nodes selected in the transport network, it can be determined in advance before launching the Lagrangian algorithm. Thus, for different number of hub nodes expected, different hub-and-spoke models are constructed and the transport cost and time vary according to them. For one specific case, it is advised to repeat calculating for getting the proper hub-and-spoke transport network.

#### 4. Conclusions

This paper constructs a transport network by identifying different node levels and proposes a modelling method to solve the route-decision problem by selecting some hub nodes with the objective to minimize the cost. So, the Lagrangian relaxation heuristic algorithm has been designed to solve the complex model built.

As for the study case, from the above calculation results, it can be seen that the axle-spoke model is beneficial to CR Express to reduce transportation costs. In addition, the hub location and path allocation of the CR Express and spoke network design are affected by the distribution of goods. Chongqing, Zhengzhou, and Malaszewicze are the public hub nodes under most strategies and attract more non-hub nodes. In a more active strategy, traffic will converge towards a certain hub pair, causing congestion on that hub pair to increase; the three cost functions represent lower, moderate, and higher discount rates, which can correspond to CR Express development period, and it can be seen from the calculation results that there is a single hub without affiliated non-hub nodes, which indicates that fewer hub nodes can be selected when the scale in the early stage is smaller.

#### Data Availability

The data used to support the findings of this study are included within the article.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

This research was supported by the Projects of Science and Technology Research and Development Plan of China Railway Corporation (P2018X003) and Ministry of Education Humanities and Social Sciences Project (16XJCZH001).