Abstract

The metro system is an important component of the urban transportation system due to the large volume of transported passengers. Hub stations connecting metro and high-speed railway (HSR) networks are particularly critical in this system. When HSR trains are delayed due to a disruption on the HSR network, passengers of these trains arriving at the hub station at night may fail to get their last metro connection. The metro operator can thus decide to schedule extra metro trains at night to serve passengers from delayed HSR trains. In this paper, we consider the extra metro train scheduling problem in which the metro operator decides how many extra metro trains to dispatch and their schedules. The problem is complex because (i) the arrival of delayed HSR trains is usually uncertain, and (ii) the operator has to minimize operating costs (i.e., number of additional trains and operation-ending time) but maximize the number of served passengers, which are two conflicting objectives. In other words, the problem we consider is stochastic and biobjective. We formulate this problem as a two-stage stochastic program with recourse and use an epsilon-constrained method to find a set of nondominated solutions. We perform extensive numerical experiments using realistic instances based on the Beijing metro network and two HSR lines connected to this network. We find that our stochastic model outperforms out-of-sample a deterministic model that relies on forecasts of the delay by a range of 3–5%. Moreover, we show that our solutions are nearly optimal by computing a perfect information dual bound and obtaining average optimality gaps below 1%.

1. Introduction

Metro lines are typically connected with high-speed railway (HSR) lines at some transfer stations to provide seamless transfer service for passengers. The metro system is indeed an important component of the urban transport system and is crucial to meet the transportation demand of passengers from HSR trains, especially at late night when fewer buses and taxis are operated. For instance, HSR passengers arriving at Beijing South Railway Station at night prefer using metro trains since it usually takes more than one hour waiting time to get a taxi service and it is inconvenient to take buses at night [1].

Inevitably, unplanned events such as adverse weather conditions or infrastructure failures occur in HSR operations, which may cause a major disruption. For example, on April 21st of 2019, a disruption caused by the equipment failure due to heavy rain occurred in the Beijing–Guanzhou HSR in China, resulting in more than 50 delayed trains with the longest delay time being nearly 6 hours. On July 2nd of 2020, nearly 30 trains were delayed in the Shanghai-Kunming HSR in China because of a flood. Xu et al. [2] present several statistics for disruptions affecting the Chinese HSR and show that events involving more than 10 trains are not infrequent. If a major disruption occurs at night, passengers from delayed HSR trains will arrive at the transfer station very late and risk to missing the last metro trains. The metro operator may thus consider running extra metro train services on the connected metro lines to transport passengers at the hub transfer station. We define an extra metro train service as an additional train service which operates later than the last train on the same metro line and is scheduled only in emergency cases, which includes disruptions occurring in railway or aviation systems, major events such as large concerts taking place, or particularly bad weather conditions such as blizzards. These situations will indeed cause a sudden flow of passengers to one or more metro stations. In this study, we consider the case of a disruption occurring on the HSR network.

In practice, the duration of a disruption is usually uncertain and results in HSR trains arriving at their destination with a delay which is unknown at the time the disruption starts. When the metro operator is informed from the HSR operator about a disruption, it only receives information on which HSR trains are affected and some estimate of their delay in the form of forecasts or probability distributions and must decide how many extra metro train services to schedule. It is important that the metro operator takes this decision as soon as it is informed about the disruption to have enough time to notify the drivers, metro staffs, and passengers in the delayed HSR trains. In other words, this decision has to be taken under uncertainty, before knowing the exact arrival time of the HSR trains affected by the disruption. Subsequently, the metro operator is responsible to schedule these extra metro train services upon the HSR delay/arrival time disclosure. This scheduling task is particularly challenging due to the following three reasons:(1)The extra metro trains need to be synchronized at the transfer station with multiple HSR trains with different arrival times. Moreover, the stochastic arrival time of HSR trains requires making some scheduling decisions under uncertainty, which complicates the scheduling problem formulation and solution.(2)The extra metro train timetabling problem needs to balance a trade-off between the cost incurred by the metro operator and the “passenger costs,” i.e., the number of passengers that miss the extra metro trains. To illustrate, consider the following two solutions: (i) operate one extra metro train service each time an HSR train arrives. This solution is ideal for passengers since they all could leave the hub station quickly. However, the metro company operates many extra train services, which is expensive due to high number of drivers and staff involved at late night; (ii) operate one extra metro train only when there are enough passengers to fill it, i.e., after multiple HSR trains have arrived. This solution is an economic plan for the metro operator but passengers in the earlier arrived HSR trains are subject to long waiting times.(3)Although several researchers have studied the classical “non-extra train” metro timetabling problem (i.e., not related to the dispatch of additional trains), the problem of extra train timetabling is quite different and exhibits some nontraditional features that should be considered and which we summarize in Table 1.

At present, metro operators mostly make decisions regarding extra trains manually based on their experiences and professional judgments, that is, without relying on analytical tools such as forecasting and optimization. Moreover, most related models in the extant literature rely on deterministic assumptions on future train arrival times, whereas these values in reality are not known in advance and should be treated as stochastic [3]. Therefore, there is the need to study and solve the problem of extra metro train scheduling under uncertain delayed HSR passengers, which we consider in this paper.

We tackle the extra metro train scheduling problem by proposing a novel two-stage stochastic mixed-integer linear program to decide the number of extra metro train services to operate and the corresponding timetable under different HSR train delay scenarios. The goal of our model is on one hand to minimize the cost incurred by the metro operator (number of extra train services and operation-ending time) and, on the other hand, to minimize the number of passengers that would fail in getting the service. Since these two objectives are in conflict, our optimization model is biobjective. We use an epsilon-constraint method to generate a set of Pareto-optimal solutions. We also formulate (i) a deterministic model that we use as benchmark that relies on a single forecast of the uncertainty instead of multiple scenarios and (ii) a perfect information model that relaxes the nonanticipativity constraints and provides a dual (lower) bound on the optimal cost.

We performed extensive numerical experiments using realistic data from the Beijing metro system and the Beijing–Tianjin and Beijing–Shanghai HSR lines. We model the stochastic HSR train delays at the hub station using different probability distributions including Gaussian, Uniform, and Weibull. We found that solutions from our stochastic programming approach outperform out-of-sample those from a deterministic method by 3–5% on average, which is substantial and shows the benefit of accounting for uncertainty explicitly via multiple scenarios. Moreover, using the perfect information dual bound, we establish an optimality gap below 1% on average, indicating that our stochastic programming solutions are nearly optimal.

The rest of this paper proceeds as follows. In Section 2, we review the literature on train timetabling problems and state the contribution of this paper. In Section 3, we formalize the extra metro train scheduling problem and its assumptions. In Section 4, we introduce the passenger and metro operator costs and derive the stochastic programming formulation of the problem. In Section 5, we present our numerical study and discuss the results. We conclude the paper in Section 6.

The train timetabling problem has been studied in the literature both in a deterministic setting and with consideration of uncertain passenger demand. In order to optimize the train timetable adapted to a dynamic passenger demand environment, Barrena et al. [4] used flow variables to construct a linear representation of the objective function and presented a branch-and-cut algorithm to solve this formulation. Wang et al. [5] considered a changing passenger arrival rate and proposed an event-driven model to solve the train scheduling problem for an urban rail transit network. Cadarso and de Celis [6] introduced robust itineraries to reduce the number of miss-connected passengers and proposed an integrated model to update base schedules in terms of timetable and fleet assignments while considering stochastic demand and uncertain operating conditions. Wang et al. [7] proposed a multiobjective mixed-integer nonlinear programming model to solve the problem of metro train scheduling and rolling stock circulation planning under time-varying passenger demand. In order to improve the reliability, efficiency, and attractiveness of public transport service under fluctuated passenger demand, Cao et al. [8] used holding and speed changing operational strategies to optimize real-time schedule and proposed a solution methodology based on time-space graphical techniques to minimize schedule changes. Meng and Zhou [9] designed an integrated demand-service-resource optimization model for managing the limited infrastructure and rolling stock resources to maximize operators’ profits and passenger travel demand satisfaction. For more information, we refer to the review papers by Cacchiani and Toth [10]; Harrod [11]; and Cacchiani et al. [12].

Many researchers have applied stochastic programming (SP) approaches to transportation network planning under uncertainty. In order to optimize slack time allocation in train timetable on high-speed passenger dedicated lines, Niu and Meng [13] used a two-stage SP model with recourse, in which the first-stage decision allocates the slack time in the train timetabling phase and the second-stage simulates the execution of train timetable with consideration of “train dispatching” behaviors. Meng and Zhou [14] proposed a robust single-track train dispatching model under a dynamic and stochastic environment and designed a scenario-based rolling horizon solution approach to systematically generate and select meet-pass plans under different stochastic scenarios. Based on railway optimization by means of alternative graphs (ROMA) [15] and Environment for the desiGn and simulaTion of RAIlway Networks (EGTRAIN) [16], Quaglietta et al. [17] set up an innovative framework to investigate the stability of optimal dispatching plans against the dynamic evolution of randomly disturbed traffic conditions. In order to minimize energy consumption in metro operations, Li and Lo [18] formulated an integrated dynamic train scheduling and control optimization framework to satisfy the changing passenger demands during daily metro operations. Hassannayebi et al. [19] presented a robust train timetable model to adapt the dwell time variability, travel time, and demand uncertainty of metro network and improve service. They used a two-stage simulation optimization approach based on genetic algorithm to minimize the expected passenger waiting times. Shakibayifar et al. [20] proposed a two-stage SP model to cope with stochastic fluctuation of arrival rates in an urban train timetable problem. Considering the uncertainty of a disruption happening in railway operations, Zhu and Goverde [21] formulated and solved a robust timetable rescheduling problem using a rolling horizon two-stage SP method.

We summarize some of the most relevant studies on train scheduling in relation to our paper in Table 2. As shown in the table and discussed above, although several researchers have already considered the classical train scheduling problem, both for railways and metro systems, to the best of our knowledge, none so far has considered to the extra metro train scheduling problem. Since these two problems are quite different (see also Table 1), it is not possible to tackle the extra train scheduling problem by adapting existing models from the standard scheduling literature. Therefore, it is necessary to develop a new mathematical model to describe this problem, which is challenging due to its stochastic and biobjective nature as discussed in Section 1. Finally, worth mentioning are also studies that focus on the specific timetabling aspect of synchronizing the last trains in railway or metro systems, recent examples of which include Yang et al. [22]; Chen et al. [23]; and Long et al. [1]. Although the last metro train synchronization problem also considers factors such as the successful transfer of passengers and the running time of the last trains, this problem is conceptually very different than the extra train scheduling tackled in this paper, where the main decision is about how many additional trains to add at night to serve delayed HSR passengers. Moreover, the extra train scheduling problem involves dependencies between two systems (HSR and metro). Based on the achievements and gaps in the literature, the main contributions of this paper are the following:(1)We study the extra metro train scheduling problem, which is a new application in the literature that has previously not been studied. This application has practical relevance as it allows metro operators to reduce operating costs and to increase passenger satisfaction when scheduling extra metro trains in emergency cases, e.g., disruptions on connecting railway/HSR lines.(2)We provide a formulation to this problem, which is new and captures realistic but complex features such as the trade-off between metro operator and passenger cost (i.e., it is biobjective) and the uncertainty in the arrival time of delayed HSR trains (i.e., it is stochastic). Specifically, our formulation is a two-stage mixed-integer linear SP model, where the number of extra metro trains is determined at the first stage and their schedules at the second stage. By accounting for uncertainty explicitly in the form of scenarios, our model produces first-stage decisions which are reliable in each scenarios and hence improve the robustness of the extra metro train timetable.(3)We introduce two new cost functions to model the costs incurred (i) by the metro operator when scheduling extra metro trains at night and (ii) by delayed HSR passengers that may fail to get the last metro trains. Our optimization approach accounts for both objectives.(4)We conduct realistic numerical experiments based on the metro system and HSR lines in Beijing and show that our SP approach is very effective at solving the problem. Specifically, our approach outperforms in an out-of-sample valuation, a deterministic optimization model that replaces the uncertainty with their expected value by 3–5%, which translates to a considerable amount of money in practice. We further prove the quality of our SP solutions by computing a perfect information lower bound and obtaining average optimality gaps below 1%.

3. Problem Statement and Assumptions

In this section, we provide a statement of the problem and its assumptions. We start below by describing the inputs to the problem, i.e., the information which metro operator’s decisions are based on.(I1) HSR lines and the connected metro lines.We are given a network that includes a set of HSR lines and a set of metro lines. HSR and metro lines are directly connected at some hub stations. In Figure 1(a), we illustrate an example of simple network consisting of one HSR line and two metro lines, where we only represent the hub station for simplicity and not other metro stations. We identify the metro lines by distinguishing between operation directions as shown in the example in Figure 1(b).

(I2) A set of uncertain delayed HSR trains.We are given a set of HSR trains with some uncertain delays, e.g., a set of trains affected by a disruption on the HSR line. For each delayed HSR train, we know the number of passengers that are onboard and that are divided into as many groups as the operation directions of the connected metro lines. For each group, we are given its volume, the transfer walking time between the platform of the HSR line and the corresponding metro platform. Finally, we are given probabilistic information to represent the arrival time of each delayed HSR train at the hub station. This can be a probability distribution or a discrete set of scenarios, each provided with a delay time and occurrence probability.(I3) A set of candidate extra metro train services.For each metro line and operation direction, we consider a set of candidate extra metro train services that the operator may decide to schedule. For each extra metro train service, we are given its origin and destination stations, the running time between two stations, the dwell time at each station, and the passenger-carrying capacity.Given inputs (I1)–(I3), the extra metro train scheduling is the problem faced by the metro operator to serve passengers from delayed HSR trains with the goal of minimizing the operational costs while maximizing the number of served passengers. The metro operator has to choose the number of extra metro train services to schedule on each metro line and operation direction before knowing the exact arrival times of the HSR trains at the hub stations but only a delay distribution. When the HSR arrival time becomes known, the metro operator further schedules the selected extra metro train services by defining their departure times at the first station and the headway between two successive trains in the same operation direction. We formalize mathematically this problem in Section 4.In the definition of our problem and related mathematical model, we assume the following:(A1)Passenger transfer activities between metro trains are not considered.(A2)The rolling stock rescheduling of the metro system is neglected.(A3)The passenger-carrying capacity of each extra metro train is fixed and given.(A4)For each extra metro train service on the same operation direction, the stopping pattern, the running time between two stations, and the dwell time at each station are the same.(A5)The passenger transfer walking time at the hub station is known and fixed. In the numerical study, we use parameters for the slowest transfer walking time among passengers but other choices are possible.(A6)A passenger is not willing to wait for a metro train service at night for more than a maximum, fixed time allowance. This allowance could be set, for instance, to the average waiting time for a taxi service at night. If the waiting time is higher, then the passenger will select another transport mode.(A7)Rescheduling the HSR system is not considered as it is exogenous to the metro operator.

4. Biobjective Stochastic Programming Model

In this section, we present our stochastic programming model for scheduling extra metro trains to serve uncertain delayed HSR passengers. For convenience, we start in Section 4.1 by summarizing the nomenclature. In Section 4.2, we formally describe the decision-making process underlying our optimization model. In Sections 4.3 and 4.4, respectively, we define the objective functions and constraints of the model. Since our stochastic program is biobjective, we explain in Section 4.5 our approach to solve it.

4.1. Notation

In Table 3, we introduce the notation that will be used to define our model. This table contains, in the order, subscripts and sets, input parameters, and decision variables.

4.2. Decision-Making Process

After a major service disruption on a HSR line at night, the metro operator needs to run extra metro trains to transport the passengers arriving from the delayed HSR trains. The duration of a major disruption is typically uncertain and may last for several hours, resulting in HSR trains reaching the hub metro station very late, after the working shifts of the metro staff have ended. Thus, it is important that the metro operator decides on the number of extra train services as soon as it is informed about the disruptive event to have enough time to notify the metro staff, including drivers, and the passengers on the delayed HSR trains. In other words, this decision is taken under uncertainty. Formally, the decision-making process is defined by two stages as illustrated in Figure 2. At the first stage, the metro operator is informed about the disruption on the HSR network and decides on the number of extra metro train services to schedule. This decision is called first-stage decision (or here-and-now decision) and is made knowing the set of delayed HSR trains, the amount of passengers on these trains, and delay information for each HSR train in the form of a set of scenarios or probability distribution. However, this decision has to be taken before knowing the exact arrival times of the delayed HSR trains so that drivers and staff for the extra metro services can be notified with sufficient margin (see also our related discussion in Section 1); in other words, without knowing the scenario that will realize.

Upon disclosure of the HSR arrival times (e.g., when the disruption is resolved and the HSR trains are rescheduled), the metro operator defines the exact schedule of the metro train services previously selected. Specifically, the operator chooses the departure time of each extra metro train service , the headway between two adjacent train services , and implicitly also the assignment of passengers . These decisions are referred to as second-stage decisions (or wait-and-see decisions) as they adapt to the realization of the uncertainty, that is, they can be chosen after the uncertain scenario is revealed. The power of SP is that, while determining the first-stage decisions (i.e., ), it also takes into account the uncertainty and endogenously models the second-stage decisions for the detailed extra metro train timetable of each scenario (i.e., , , ).

4.3. Objective Functions: Operational and Passenger Costs

As discussed in Section 1, the metro operator has to consider and balance the operational cost of scheduling extra metro train services and the amount of passengers that can be served by these services. Below, we thus formalize two cost functions to describe operational and passenger costs. We start by identifying the three most relevant performance indicators for our problem: the operation-ending time of the last extra metro train service (OET), the Number of Extra metro Train services (NET), and the Number of Passengers that are Failed in getting services (NPF). The first and second indicators (OET and NET) will be used to define the operational cost, while the third indicator (NPF) to define the passenger cost. Since these indicators have different units, we convert them all into monetary units using conversion factors as suggested by one of the metro operators in China [24]. The conversion factors could be chosen by the operator depending on the relative importance that each of these three indicators has.

4.3.1. Operational Cost

At late night, the metro operator prefers operating fewer extra metro trains and ending operations early to reduce operating expenses, which corresponds to having low NET and OET indicators.We quantify NET and OET in equations (1a) and (1b), respectively, and illustrate the two functions in Figure 3. As shown in equations (1a) and Figure 3(a), the metro operator pays a NET cost for each extra metro train service, which is intuitive. Moreover, the operator incurs an OET auxiliary cost that is mainly due to lighting, air conditioning, and overtime pay of staff for the passed stations. The OET cost only depends on the operation-ending time and not on the number of extra metro trains as NET, which enables decoupling these two cost components. In this paper, we assume that the OET cost increases linearly with the operation-ending time with slope , as indicated in equation (1b) and shown in Figure 3(b). Specifically, this equation captures, for each operation direction , the difference between the last extra metro train operation-ending time on and the predetermined operation-ending time of the last (nonextra) metro train service. In contrast to , notice that can only be determined at second stage and is therefore indexed by the scenario . The total operational cost incurred by the metro operator when scheduling extra metro trains is given in equation (1c) by summing up the two components NET and OET.

4.3.2. Passenger Cost

It is well known that in general a timetable that only minimizes operational costs would be disadvantageous to passengers. This issue is even more acute in our extra metro train scheduling problem since considering operational cost alone would result in no extra metro train scheduled at all. Therefore, it is imperative that passengers are also accounted for. The cost for passengers could be modeled using performance measures such as the number of transported passengers (e.g., [1]), passenger travel time (e.g., [2527]), passenger waiting time (e.g., [4, 28, 29]), and delay time [30]. Although minimizing total or average waiting time is a common objective in train scheduling, in case of last/extra train optimization (i.e., at late evening/night), most approaches regard the feasibility (i.e., passengers reaching their house or not) rather than optimality in terms of travel time (e.g., [1] and references therein). Thus, given the peculiarity of our scheduling problem, in this paper, we propose using the NPF as passenger cost function since passengers are eager to leave the transfer station at night but also include constraints on the maximum time passengers are willing to wait for their metro connection. The NPF is defined as the number of passengers who cannot leave the hub transfer station by any extra metro train service.

Equation (2) quantifies NPF, i.e., the passenger cost. As shown in Figure 4, a unit cost is imposed to each passenger that fails to use any extra metro train, giving the total passenger cost .

Our biobjective formulation below minimizes the expected costs over scenarios of the metro operator and passengers.

4.4. Constraints

We specify below the different constraints in our problem.

4.4.1. Passenger Waiting Time

The passenger waiting time is defined as the time that passengers have to wait for a metro train after they arrive at the metro platform of the hub station. In particular, constraints (4a) model the transfer waiting time based on departure time of the connecting extra metro train service , arrival time of HSR train , and walking time from train service to connecting extra metro train service . Constraints (4b) and (4c) are big- constraints. Constraints (4b) ensure that if there are passengers from HSR train being assigned to extra metro train service (i.e., ), then the transfer between and should be possible for such passengers, i.e., the transfer waiting time should be nonnegative. Constraints (4c) are similar to the former constraints but put an upper bound on the waiting time equal to (minutes), which is the time allowance for passenger transfer waiting time. As discussed in our assumptions, if the waiting time is more than , then passengers would select other transportation modes rather than waiting for a long time in the metro station at night.

4.4.2. Mapping between Continuous and Binary Passenger Assignment Variables

Constraints (5) are used in the model to map the passenger assignment variables to the 0-1 binary passenger assignment variables . These constraints model the following if-then condition:

4.4.3. Passenger Flow Balance

Constraints (6) ensure that the total number of passengers assigned to the connected extra metro train services on the same operation direction are less than or equal to the volume of delayed passengers from train that transfer to operation direction .

4.4.4. Capacity of Extra Metro Trains

Constraints (7) enforce the maximum passenger-carrying capacity of each extra metro train.

4.4.5. Starting Time of the Extra Metro Train Service at the Origin Station

Constraints (8) impose each extra metro train to leave the hub station after the last scheduled (i.e., nonextra) train on line has departed.

4.4.6. Headway Time between Consecutive Metro Trains

Constraints (9a) define the headway time between two consecutive metro train services and on the same line , i.e., the difference between the departure times of two consecutive trains from the origin station. Note that the candidate train services are sequentially numbered and must be selected in the sequence. If two consecutive train services and on line are selected by the metro operator, then constraints (9b) enforce the headway time between them to be no less than a minimum headway time , which is needed to ensure safe movements.

4.4.7. Mapping between First-Stage and Second-Stage Variables

Constraints (10) map the second-stage decision variables , , and to the first-stage decision variables to describe whether train service on line is selected by the metro operator for serving delayed passengers.

4.4.8. Departure Time at Intermediate Stations

Constraints (11) ensure that the departure time of an extra metro train service at an intermediate station is no smaller than the arrival time of this train at the same station plus the minimum required dwell time.

4.4.9. Arrival Time at Intermediate Stations

Constraints (12) ensure that the arrival time of an extra metro train at a station is at least as large as the sum of the departure time of this train at the previous station and the travel time between stations and .

4.5. Epsilon-Constraint Formulation

Recall that we aim to solve a problem which is not only stochastic but also biobjective and is defined by objective functions (3) subject to constraints (4)–(12). Since all objective functions and constraints are linear, this mathematical program is classified as a biobjective mixed-integer program.

Several approaches exist in the literature to determine the set of nondominated (i.e., Pareto-optimal) solutions of a multiobjective optimization problem. The most common approaches are known as scalarization techniques. They construct a single-objective problem related to the original multiobjective one and solve it usually multiple times to find some subsets of nondominated solutions [31]. One such scalarization technique is the weighted-sum method, in which the objectives are combined with a convex combination into a single objective. Although the weighted-sum method is guaranteed to produce Pareto-optimal solutions, it also has the well-known drawback that it can only find Pareto-optimal solutions that lie on the convex hull of the nondominated set. In other words, if the nondominated set (i.e., frontier in our case of two objectives) is nonconvex, then not all Pareto-optimal solutions can be found [31]. To overcome this shortcoming, we chose a different scalarization technique known as the epsilon-constraint method, which retains only one objective for minimization and turns the others into constraints.

Our epsilon-constraint formulation minimizes the expected passengers cost alone whilst imposing a maximum expected cost for the metro operator equal to :

By repeatedly solving model (13) with different values for , we can approximate the Pareto front of the proposed biobjective optimization problem.

5. Numerical Study

In this section, we present our numerical experiments based on a real-world network from Beijing in China. We describe our case study and the scenarios of the uncertainty in Sections 5.1 and 5.2, respectively. We then discuss our result, starting from the trade-off between passenger cost and metro operator cost in Section 5.3, followed by a performance comparison between our SP approach, a deterministic benchmark, and a dual bound in Section 5.4. We conclude in Section 5.5 by evaluating the methods out-of-sample.

5.1. Test Case Description

Our numerical experiments are based on the network and operational data from the Beijing metro system and two HSR lines: Beijing–Tianjin (BT) and Beijing–Shanghai (BS). Specifically, the network we consider consists of 44 stations in total and is illustrated in Figure 5. We consider two metro lines with three operation directions (i.e., Line 14 up, Line 4 up, and Line 4 down). Lines 4 and 14 of Beijing subway system are connected with the BT and BS HSR lines at the Beijing South Railway Station (BSRS).

We set the maximum number of extra metro train services that can be scheduled to 15 for each operation direction, which based on experimentation is a reasonable upper bound. The travel time and passenger-carrying capacity of each extra metro train that operates in the same operation direction is identical. The travel time (i.e., the sum of running time between consecutive stations and dwell time at each station) of the extra metro trains on operation direction 1, 2, and 3 is 42, 35 and 60 minutes, respectively. The capacity of trains on operation direction 1, 2, and 3 is 1480, 1480 and 1960 passengers, respectively. The minimum headway time between two successive extra metro train services on the same operation direction is 3 minutes. The planned operation-ending time of the last metro train service on operation direction 1, 2, and 3 is 23:15, 23:03, and 22:40, respectively. The passenger transfer walking time from the platforms of the BT line and BS line to the metro platforms is, respectively, 10 and 15 minutes, for each metro line and operation direction. The unit operating cost per second is 5 Chinese renminbi (RMB) and the fixed cost of operating an extra train service is 20000 RMB. Moreover, we assign a per-passenger cost for passengers who failed in getting services of 20 RMB and a time allowance for passenger transfer waiting time of 15 minutes.

Regarding the two HSR lines, we consider 20 delayed trains arriving at the hub station, i.e., BSRS. Table 4 shows the planned arrival time and passenger-carrying volume of each train. Since in China all HSR passengers need to book their tickets in advance with a specified departure time from the origin station, the precise number of passengers onboard different trains is known from the ticketing system.

Our optimization model is solved using CPLEX 12.3 with default settings as the mixed-integer linear programming solver. The experiments were performed on a computer equipped with an Intel® CoreTM i7-8550 CPU @ 1.80 GHz processor with 8 GB RAM.

5.2. Scenarios of the Uncertainty and Computational Time

We model the stochastic arrival time of each delayed HSR train at the hub station using Gaussian, Weibull, and uniform probability distributions, which are commonly used in the literature to model train delays [17, 21, 3235]. These distributions are shown in Table 5 and are defined so that they all have the same expected value of one hour. In the experiments presented here and in Section 5.3, we focus on the Gaussian distribution alone. We subsequently consider all the three distributions in Sections 5.4 and 5.5 to assess the robustness of our findings towards the uncertainty.

We refer to the scenarios used in the SP model to find the decisions as the in-sample scenarios. It is well known in stochastic programming that the quality of the first-stage decisions are affected by the quality and number of in-sample scenarios. Typically, using more scenarios results in a better approximation of the uncertainty distribution, hence a better decision. On the other hand, the number of variables and constraints in the model increase with the number of scenarios. Thus, the number of in-sample scenarios is limited by the available computing power and time. In sum, choosing the number of such scenarios entails balancing a trade-off between solution quality and computation and is usually nontrivial.

We investigate this trade-off by assessing the solvability of our SP model for a number of in-sample scenarios varying between 1 and 10. For each number of scenarios , we solve the model 5 times, each time sampling different sets of scenarios from the probability distribution. Figure 6 shows the computational time results, including the minimum, average, and maximum computational time among the 5 runs. As expected, the computational time increases with the number of scenarios. We can see that the model is solvable relatively quickly for 9 scenarios, for which the average and maximum running times are 523 and 960 seconds, respectively. However, the average and maximum running time increase, respectively, by about 140% and 240% when moving from 9 to 10 scenarios. Therefore, we choose a number of in-sample scenarios equal to 9 in our experiments, for which the stochastic program is solved to optimality in modest computation time. The 9 scenarios that we use in Section 5.3 are displayed in Figure 7, showing the arrival time of each HSR train at BSRS. The occurrence probability of scenarios 1 to 9 is 0.09, 0.12, 0.11, 0.11, 0.1, 0.12, 0.12, 0.1 and 0.13, respectively.

5.3. Trade-Off between Passenger and Operator Cost

In this section, we investigate the biobjective aspect of our stochastic optimization problem. We consider the 9 in-sample scenarios introduced in Section 5.2 and use the epsilon-constrained method to produce the Pareto frontier illustrated in Figure 8. To obtain this figure, we first solved a model that minimizes passenger cost alone (i.e., without epsilon-constraints), which provided the extreme of the Pareto frontier in the right-bottom corner of the figure with a passenger cost of zero and operator cost of approximately 590,000 RMB. Then, we obtained 9 other Pareto solutions by setting epsilon to values lower than 590,000, specifically 550,000, 510,000, 460,000, 420,000, 370,000, 330,000, 280,000, 230,000, and 190,000. For each value of epsilon, we display with small solid circles the metro operator and passenger cost for the timetables obtained under each of the 9 scenarios. The larger circles represent the expected cost of these 9 timetables for each value of epsilon and we call such solutions the SP solutions.

We summarize in Table 6 the most relevant information from Figure 8, including the metro and passenger cost of the SP solutions. The values in this table and the convex shape of the frontier suggest diminishing returns when lowering passenger cost. To elaborate, consider solution 1 in the table in which the operator cost is low but passenger cost is high (i.e., the leftmost point in Figure 8). By moving from solution number 1 to number 2, for an additional cost of about 50,000 RMB the operator can reduced the percentage of failed services by 15%, which is a significant reduction. The operator can further reduce this percentage by 13% for a similar cost increase when moving from solution number 2 to number 3 in the frontier. However, the more we move to the right in the frontier, the more expensive it becomes to reduce the percentage of failed services. For example, moving from solution number 8 to number 9 only reduces this percentage by an additional 3% for a similar increase in operating costs. This finding shows that, in our case study, the operator should strive to reduce the percentage failed services down to at least 30–40% as it is relatively cheap to do so, but satisfying more than 90–95% of passengers might be too expensive and hence not economical.

For the extreme SP solution number 10 in which all passengers are served (i.e., passenger cost equals 0 and operator cost is maximal), we report in Figure 9 the departure time of each extra metro train service from the BSRS station, respectively, for operation direction 1, 2, and 3, under each scenario. As illustrated in these figures, the first-stage decision (i.e., the number of extra metro train services) is the same in each scenario, that is, we need 7 extra metro train services for each operation direction. It is not a general result that the same number of extra metro train services is needed in three operation directions, it just happens here. The departure time decisions of each extra metro train service are typically different in each scenarios since they are second-stage decisions, hence they can adapt to the realization of the uncertain HSR train arrival time.

5.4. Deterministic Benchmark and Dual Bound

In the following, we define a deterministic model that replaces future uncertainties with their expected values and that we use as benchmark to our stochastic model. We call the solution from this model the expected value solution, henceforth EV solution. We also define a perfect information model that provides a lower bound on the optimal cost. We name the solution from this model the perfect information solution, or PI solution. We discussed both models below.

5.4.1. EV Solution

A common solution procedure to solve a stochastic optimization problem is to replace all random variables with their best available estimate, namely, their expected value, and solve a deterministic model. In our case, this means constructing the timetable using the expected values of the delay time of high-speed trains rather than multiple delay scenarios. To formalize, the EV solution is the extra metro train timetable that is obtained as optimal solution to the following formulation:and constraints (4a)–(12).

EV and SP solutions are computed based on models that differ in both objective function and constraints. Consequently, the resulting objective value from the EV model is not directly comparable to that of our SP model. To provide a fair comparison, we need to evaluate the EV solution using the same 9 scenarios that we used in SP. Specifically, the EV solution provides a first-stage decision, i.e., number of extra metro trains, based on deterministic information. We fix this decision, and for each scenario , we optimize the second-stage decision (i.e., the timetable) and calculate the cost of passengers and metro operator. The sample average across scenarios represents the expected cost of the EV solution.

5.4.2. PI Solution

The PI solution is obtained by relaxing the nonanticipativity constraints embedded in our SP model and assuming full information about the future. Mathematically, this means making the first-stage decision (i.e., number of extra metro trains) scenario-dependent, i.e., this decision also adapts to the uncertainty. The PI solution provides a dual (lower) bound on the optimal cost since it exploits information that in reality is not available to the decision maker. For the same reason, this solution is also not feasible. In practical terms, PI solutions are infeasible as the operator needs time to arrange the unplanned shifts of metro drivers and staff, which cannot be done after the HSR train has already arrived. Formally, for a given scenario of the uncertainty , the PI solution solves the following hindsight model:and constraints (4a)–(12).

PI solutions should also be evaluated on the same 9 scenarios used in the SP and EV solutions.

We now compare the SP, EV, and PI solutions for a specific point in the Pareto frontier corresponding to . For the three solutions, Figure 10 shows the total expected cost of passengers and metro operator resulting under the three probability distributions in Table 5.

As shown in Figure 10, SP solutions decrease the expected total cost compared to EV solutions by 3.93%, 2.23%, and 3.10%, respectively, for Gaussian, Weibull, and uniform distributions. On average across distributions, our SP method improves the EV approach by 3.09%, which is a significant improvement. The difference between EV and SP solutions is also known as the value of stochastic solution (VSS; [36]). The high VSS value indicates that accounting for the uncertainty explicitly through multiple scenarios is valuable in our extra metro train scheduling application and would allow the metro operator to save a considerable amount of money to obtain the same service level to passengers.

Compared to SP, using perfect information decreases the expected total cost by 1.31%, 0.23%, and 0.72% in the three distributions. In other words, our SP solutions achieve an average optimality gap of 0.75%, i.e., they are near optimal. The difference between the SP and PI objective values is known as the expected value of perfect information (EVPI; [36]). On average, the EVPI that we obtain is 4146 RMB, which is relatively low and represents the maximum cost the metro system would be willing to pay to access information about the uncertainty in advance.

5.5. Out-of-Sample Valuation

Recall that we employ a two-stage SP model to solve our extra metro train scheduling problem and that the solution from this model is tied to the scenarios that are chosen, i.e., the in-sample scenarios described in Section 5.2. Due to limitations in the available computing power and the complexity of the SP model, we selected 9 in-sample scenarios. Our SP model implicitly assumes that these 9 scenarios are the only possible realizations of the uncertainty. However, these scenarios only provide a discrete approximation of the entire uncertainty outcome, which is given by the full delay probability distributions of all incoming HSR trains. As a consequence, the SP results based on the in-sample scenarios might be optimistically biased.

To investigate, if this is the case, and to obtain a fair and unbiased method comparison, we evaluate the performance of the three solutions (SP, EV, and PI) out-of-sample. The out-of-sample valuation proceeds as follows. We sample 50 new scenarios for each HSR train from each of the three probability distributions illustrated in Table 5 (i.e., Gaussian, Weibull, and uniform). For each scenario, we fix the SP first-stage decision and obtain the second-stage optimal decision by solving the recourse optimization problem. The sample average over the 50 scenarios provides the out-of-sample expected cost of the SP solution. We proceed analogously using the EV model discussed in Section 5.4 to obtain the out-of-sample valuation of the EV model. Regarding the PI model, we proceed as in SP and EV with the exception that the first-stage decision is not fixed but is free in each out-of-sample scenario, i.e., it can adapt to the scenario of the uncertainty.

Figure 11 reports the probability density function of the total expected cost (operator and passengers) and the expected passenger cost resulting from the out-of-sample valuation of the different methods under Gaussian, Weibull, and uniform distribution. As we can see from Figures 11(a), 11(c), and 11(e), the SP distribution is substantially shifted to the left compared to the EV distribution. The total expected cost in SP is 4.19%, 4.91%, and 3.12% lower than the analogous cost in EV for the three distributions. Moreover, the perfect information lower bound is 0.99% lower than SP on average across distributions. These results are consistent with the in-sample results previously discussed and provide support for our choice of the 9 scenarios. Finally, the relative performance of the methods is similar when considering the probability density function associated with passenger cost displayed in Figures 11(b), 11(d), and 11(f).

6. Concluding Remarks

In this paper, we studied the problem of scheduling extra metro trains to serve uncertain delayed high-speed railway passengers, which is a new application in the literature. To solve this problem, we proposed a two-stage stochastic program that we formulated as a mixed-integer linear programming model. Our optimization problem is biobjective and balances the operational cost incurred by the metro operator with the passenger cost, defined as the number of passengers that miss the last metro trains. To illustrate the relevance of our two-stage SP approach, we performed numerical experiments using real-world data from the Beijing metro network and two HSR lines connected to this network. We generated a Pareto frontier and provided insights on how to balance the operator and passenger costs. Additionally, we compared the performance of our SP solution, a deterministic model that uses a forecast of the uncertainty (EV solution) and a hindsight model that relaxes nonanticipativity constraints (PI solution). We found that our SP solution evaluated out-of-sample which improves the EV solution by about 3% on average and that the former solution exhibits average optimality gaps below 1%, hence it is near optimal.

Future research avenues include the following extensions of the problem. First, we could consider a detailed passenger’s origin-destination (OD) demand and passengers’ behavior to provide better services for passengers [37]. The resulting problem would become more complex as it would require scheduling extra trains in the whole metro network and not only at the connecting metro-HSR station, as we do in this paper. Alternatively, stopping patterns could be included in the model to identify the optimal stop pattern for each extra train service according to the passenger OD demand. Finally, another option would be considering the number of allocated passengers to metro trains as first-stage decision so that the metro operator could more conveniently dispatch the rolling stock. This extension would also be challenging due to involving the joint schedule of extra metro train services and rolling stock.

Data Availability

Previously reported data were used to support this study and are available at the website of Beijing Subway: https://www.bjsubway.com/station/xltcx/line1/.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

Sihui Long and Lingyun Meng were supported by the National Natural Science Foundation of China (grant no. 61790573 and 72022003), the State Key Laboratory of Rail Traffic Control and Safety (Beijing Jiaotong University, grant no. RCS2019ZJ001), and the National Key Research Project (grant no. 2018YFB1201403). Xiaojie Luan was supported by the NWO (Netherlands Organisation for Scientific Research) Rubicon (grant no. 019.192EN.014). Alessio Trivella and Francesco Corman were supported by the Swiss National Science Foundation (project DADA, grant no. 181210).