Abstract

As a critical foundation for train traffic management, a train stop plan is associated with several other plans in high-speed railway train operation strategies. The current approach to train stop planning in China is based primarily on passenger demand volume information and the preset high-speed railway station level. With the goal of efficiently optimising the stop plan, this study proposes a novel method that uses machine learning techniques without a predetermined hypothesis and a complex solution algorithm. Clustering techniques are applied to assess the features of the service nodes (e.g., the station level). A modified Markov decision process (MDP) is conducted to express the entire stop plan optimisation process considering several constraints (service frequency at stations and number of train stops). A restrained MDP-based stop plan model is formulated, and a numerical experiment is conducted to demonstrate the performance of the proposed approach with real-world train operation data collected from the Beijing-Shanghai high-speed railway.

1. Introduction

In most countries, high-speed railway (HSR) is significant in daily life owing to its reliability, safety, low emissions, and energy savings. Due to increasing passenger demand and the growing high-speed railway network scale, the transport organisation becomes increasingly complex. However, train operation management must be maintained at an acceptable efficiency level. The train stop plan is a key element in the operation plan for satisfying the increasing passenger demand and reducing the operational costs of the railway company. The stop plan can impact the frequency of train service and the load of trains at each station, which directly influences transportation resource utilisation. Assad [1] presented the train operation plan as an optimisation problem with hierarchical structures. As a critical part of the train operation plan, the train stop planning (TSP) problem was first proposed by Patz [2]. Generally, it is considered as a subproblem of the train operation plan. As a critical link in the structure of train operational management plans, the stop plan is strongly correlated with route, stop pattern, travel range, number, type, and level of all involved trains. It focuses on determining three primary variables: stations served by each train, number of stops for each train, and service frequency at each station.

With a limited train fleet size and station capacity, it is critical for the stop plan to simultaneously consider passenger demand and train stop patterns, with the goal of achieving service-demand equilibrium with the given transportation resources. A contradiction always exists in the TSP problem: a sufficient number of train stops are required for serving passengers along the railway line, but too many train stops result in reduced operating efficiency and low resource utility. Railway companies must find an effective means of balancing transportation costs and varying passenger demand.

The TSP provides a critical foundation for a complete train timetable for the entire railway network, especially for railway systems that cannot compute the timetable directly due to the scale of operations. In China, the TSP is practically determined based on passenger volume prediction and the preset station level, an indicator of the significance of each station. This is a convenient way to quickly find a solution; the subjective station levels may implicitly include abundant information. However, with changing society and economics along the railway, this parameter is not updated fast enough and can lead to an unreasonable train stop plan solution. As a basic decision support plan, the train stop plan is typically underestimated in the literature (see Section 2). Thus, in our train stop plan study, we consistently consider railway properties, such as travel distance, train fleet size, types of trains, and service-node features, such as scale, population, and economics, and examine how these factors jointly influence the train stop plan solution. To resolve this issue, a data-mining approach is applied using the railway and a service-node factors dataset to analyse the station level, an important foundation for making a stop plan. A modified Markov decision process (MDP), called as RMDP stop plan model, is proposed to explore the best policy for the stop plan using all the train operation data through iterative training and feedback based on the station level.

This study makes contributions in the following areas: (i) railway properties and city features are listed as input parameters to better reflect environmental influence and improve train stop plan quality, (ii) a data-mining technique is applied to explore the station level through quantitative analysis of effective features using a dataset from the Beijing-Shanghai high-speed railway, and (iii) a restrained Markov decision process (RMDP) is proposed to find the optimal policy to achieve a better train stop plan.

The remainder of this paper is organised as follows: Section 2 reviews the recent literature related to the train stop plan problem; Section 3 introduces the problem background and framework of stop plan and proposes a stop plan scheduling model based on reinforcement learning approach; the experiment and numerical results are presented in Section 4; and conclusions and future research are presented in Section 5.

2. Literature Review

The hierarchical structure of the train operation plan comprises the following sequential subproblems: train operation zone, stop schedule (including train stop planning and train time tabling), rolling stock, and crew scheduling [3, 4]. Generally, train stop plans are considered synthetically with other subproblems in train operation plans [58]. Recently, Niu et al. [9] constructed an optimal model considering train stop planning and time tabling for minimising passenger waiting time to balance the time-dependent demand. Yang et al. [10] developed an optimisation method for both train stop planning and train scheduling problems to provide collaborative operation strategies. Altazin et al. [11] aimed to minimise the recovery and waiting times of passengers with a multiobjective model considering the stop schedule. Wang et al. [12] studied the integration of train scheduling and rolling stock circulation planning under time-varying passenger demand. Qi et al. [4] considered dynamic passenger flow and proposed a robust optimisation model for train time tabling and stop planning. Zhu and Goverde [13] formulated a timetable rescheduling model with flexible stopping and flexible turning considering retiming, reordering, and cancelling. The TSP problem is commonly combined with other subproblems.

However, the stop plan is always simplified or taken as one known input condition. Such treatment, albeit idealistic, may ignore the influence of the stop plan on actual operation. The stop plan is also an important subproblem related to passenger service quality and is influenced by a series of factors (e.g., stop stations and station service frequency). Thus, the TSP problem must be considered.

A few studies have focused on the pure TSP problem. Li et al. [14] proposed a model to minimise the total number of trains stopping and the node service frequency with a constraint of the number of trains stopping. Niu et al. [15] formulated an optimal model considering uncertain passenger flow demand at each station to minimise the total stop times of all trains. Xu et al. [16] focused on balancing the number of trains between major station stops and high frequency stops, aiming to minimise the total passenger time loss generated from both train stops and transfers. However, most of these studies focused on the optimisation model construction based on varying factors.

The solution algorithm design is another critical component of the optimisation problem, with complex factors, scale of variables, and constraints. With regard to optimisation model studies, this type of problem is always NP hard, even with certain idealistic parameter assumptions that lead to unstudied variables without empirical analysis and time-consuming searching for appropriate solutions for the TSP problem [4].

The reinforcement learning approach is an alternative to the optimisation method for solving TSP and is widely used in decision problems [1719]. However, to the best of our knowledge, limited studies have applied reinforcement learning to solve TSP.

3. Methodologies

3.1. Problem Description

With station location, station capacity, passenger demand, train operational zones, train travel distance, fleet size, number of stops, skip-stop strategy, and train type as railway operation inputs and administrative level, population, and GDP of cities as environmental inputs, our study aims to generate a capacity-equilibrium train stop plan with the best trade-off between quality of passenger transportation service and rail operation cost. A stop plan regulates the stop pattern of each train in a railway line, stopping or not stopping for passenger boarding/alighting at each station.

Train stop planning must always select some stations for each train based on passenger demand. Although increased train stops provide better passenger service, they also lead to reduced transportation efficiency (increase in travel time, less train throughput, and mismatching service supply). We introduce a clustering analysis and a modified Markov decision process-based framework to consider the railway and its environmental factors to determine a coordinated train stop plan solution that uses reasonable transportation resources and adequately satisfies the passenger demand.

3.2. Solution Framework

To solve the TSP problem, a machine learning-based two-stage solution framework is developed to gain insights into the impacts of the station characteristics and formulate a new model to achieve the optimal stop plan. In the first stage, unsupervised clustering analysis is applied to explore the railway properties and service-node features along the railway line to classify the stations; this is the primary input of the TSP problem. In the second stage, a restrained Markov decision process (RMDP) is used to optimise the high-speed railway stop plan. The framework is shown in Figure 1.

3.3. Clustering Analysis of Railway Station Level

The purpose of clustering analysis in solving the TSP problem is to provide railway station-level classification in a multifactorial manner that takes more environmental influences (features) into account, rather than considering the station level based only on its scale and location in the railway network. Fuzzy c-means clustering (FCM, see [20]) is based on the theory of the fuzzy set. A fuzzy set comprises samples and their respective properties of membership in the set. It is a soft classification without strict subordination relations. It is an unsupervised technique to put “similar” samples into the same group and to explore the patterns reflected by different groups without prior information. Without an idealistic and empirical hypothesis, this technique is more adaptive and practical than others and is beneficial for the analysis of stations with diversified features. Let be the samples representing the stations along the line and N the number of stations. Each station has features; that is, . The value of each feature can be extracted from real operational data, such as train type, fleet size, and skip-stop strategy.

FCM is used to divide the station samples into C clusters. Each cluster is characterised by its sample mean, known as the centroid. The model objective is to minimise the summation of the weighted distance between each sample and the centroid of each cluster, as in equation (1), that is, to minimise the differences in the station properties within the same cluster. The approach is a standard and widely used data-mining approach [21] and is proven to be effective for knowledge discovery from a high-dimensional dataset:where is the fuzzy factor that determines the fuzzy weight of the clustering results, uij is the degree of membership of in cluster j, and is the centroid of cluster j in the -dimensional feature space. Let denote the Euclidean distance between and . Note that the distance between each sample and each cluster centroid is measured by the Euclidean norm as in equation (2), where represents the -th feature of the i-th transformed sample and denotes the location of the centroid at the k-th dimension:

Fuzzy partitioning is conducted through an iterative optimisation of the objective function shown in equation (1), with the updated degree of membership calculated using the following equation:

The cluster centroid can be updated using the following equation:

The iterative algorithm terminates when , where is a stop criterion and is a cluster centroid matrix at iteration t. This procedure also converges to a local minimum point of JFCM. The aforementioned procedure does not specify the number of clusters. The optimal number of clusters in our study was determined based on the Xie–Beni coefficient [22] and a separation coefficient [23]. The number of station levels can be obtained from the optimal number of clusters. The station level according to each cluster’s average score, which can be calculated by the feature mean of each station, can be listed. Denote as the feature mean of station in the j-th cluster. The average score Fj of the cluster can be formulated as

The station level is ranged by the average scores: the greater the score, the higher the station level.

3.4. Markov Decision Process-Based Train Stop Planning Model

The Markov decision process is a significant machine learning concept in artificial intelligence. It has been widely used to formulate many decision-making problems with essential elements: state, action, policy, and reward [24]. The best policy can be discovered through repetitive trials and feedback based on the real data without any idealistic assumptions or empirical estimation. However, the stop plan is a special kind of decision-making problem. Compared to the standard MDP, each state transition is based on the current state, and each state is generated with train stop time and station service frequency constraints. These constraints must be considered in each state transition step during the decision process. Accordingly, we proposed a restrained MDP- (RMDP-) based model.

3.4.1. Model Structure

The decision to be made in the TSP problem is the selection of a series of feasible service-node (station) sequences for each train individually. Each sequence is taken as a stop scheme for a train and as an action. The action sequences form the MDP decision chain. Each state transition decision is made depending only on the current state and is not related to earlier states in the chain. With regard to the Markov property, the stop plan decision-making process can be formulated based on the standard MDP [25, 26]:

In this study, there is a five-tuple in the Markov decision process:(i) is a finite set of s states, expressing stop information for each station.(ii)A is a finite set of a actions, including all possible actions.(iii) is the probability that state transitions to state with action a.(iv) is the assigned reward for the system transitioning from state to state with action a.(v) is a finite set of state transition epochs; and . T is the maximum epoch.

As shown in Figure 1, the given train fleet size is taken as the total number of system states . The process must make decisions for the system from state to . For each epoch of the decision process, only the current state s and the available actions can be considered as deciding factors to change the system to the next state ; a reward is assigned for this choice. Owing to the different probabilities of the available actions, the assigned reward is based on the predefined transition probabilities . There may be several branches to choose the available actions in each epoch. The final goal is to find a chain (a policy) that maximises the utility (or value) function. Decisions are made on finite decision epochs.

The utility function for discounted Markov decision problems is defined as

The parameter is the discount factor and is conducive to the convergence of the function. Specifically, if is close to zero, considerable attention is paid to instant gratification and the convergence is faster; otherwise, the final reward is considered more significant.

The system is defined to find a sequence of actions that produce an optimal policy . To achieve this goal, the final reward can be defined by maximising the utility function; the system can be formed as

3.4.2. Actions Definition

Generally, actions are the triggers related to changing states. In each epoch, any action will induce the corresponding state. However, in this study, we defined actions as the number of train stops. Thus, action set A ranged from the lowest number of stops (two: original station and destination station) to the maximum available number of train stops m, expressed as

For the dataset we collected, the Beijing-Shanghai high-speed railway has an uncommon station Nanjing South. All trains must stop at this station for necessary technical operational work. Hence, the minimum number of stops for a train in this study is shortened from 3 to m: .

The probability density of the stop-times distribution can be obtained using statistical analysis methods. After analysing the trains collected from real-world operation data, the results showed that the actions set followed the Gaussian distribution with and .

In the TSP problem, each action is attributed to a set of stop schemes resulting from the diversity of selections for all stations with the action parameter: the train stop-times.

3.4.3. State Definition and State Filter Process

The set of states is a key constituent in the decision process. In this section, we combine the particularity of the TSP problem with the characteristics of the state space of the MDP to achieve the adaptive state set for each epoch.

Let s be expressed by the total number of stops at station n, the number of stop schemes , and the current epoch t. Here, . The station order is denoted as , . The system state set S can be expressed aswhere . Because any stop scheme should relate to a train, we define the total number of available epochs (states) T as the train quantity, which is a given condition in this problem.

During the decision process, there is a set of states that can be triggered by different actions in each epoch. In the TSP problem, the state set consists of many train stop schemes related to the total number of possible combinations of stations. However, not every probable stop scheme adapts to the current state. To simplify the range of the state set, we consider constraints based on the current state and the action probability distribution. With regard to the filter process, the appropriate state set varies in each epoch.

Thus, several critical constraints must be built. For the number of stops and service frequency restrictions, the constraints can be divided into two sides: vertical and transverse. The current state is .

The vertical constraints comprise two parts: one is the maximum number of stops at each station (denoted as , such that the parameter ) in the current epoch t must be less than or equal to the maximum number for stops of all trains, which can be represented as ; the second is the remaining number of stops at each station (). The remaining number of stops at station z from the current state to the last state must be equal to the remaining number of states . It is also related to the maximum number of stops at each station.

The transverse constraint is the maximum number of stops for each train. The number of stops for each train (denoted as ) must be less than or equal to its threshold . The entire state filter process is shown in Figure 2.

Generally, a greater number of stops at stations and for trains increases passenger convenience. However, it always leads to a higher capacity occupation, increased operation costs, and increased total travel time. Thus, both constraints are important for balancing the operation cost and service quality.

For the quality of passenger service, passenger demand is an indispensable factor influencing the maximum number of stops at each station. Hence, we must consider another parameter: the train stop rate of station z, denoted as , which is connected between the maximum number of stations and the passenger flow. It is expressed aswhere and are the passenger flow departures from station and arriving at station , respectively. is the passenger flow density of the section. The coefficient is valued by the average number of stops by the same type of trains, which is often set by the railway department. is the weight coefficient based on the station level expressed as

We can obtain the maximum stop time for each station , where Q is the maximum number of stations.

For the capacity-equilibrium utilisation, the maximum number of stops of each train plays a significant role in the operation. It is always set pragmatically, considering the train type and the total train quantity.

In each epoch, the state parameters remember the temporal number of stops for each station and the temporal combined number of stop schemes. All state parameters are updated with the state transition until and meet the maximum number of stops at station z in stop scheme .

Each action related to a substate set including several stop schemes satisfies the constraints. However, only one state is related to an action during a common MDP. Thus, we selected one stop scheme from the substate set for each action. To maintain the system performance, it is effective to calculate the instant rewards for an action to choose the best stop scheme from all the alternatives in the substate set. Denote as the instant reward of the m-th stop scheme in the substate set referring to action a in current state . The selected scheme is expressed aswhere M is the maximum stop scheme order in the substate set of action a. The stop scheme with the maximum instant reward is selected corresponding to action a. The entire process is shown in Figure 3. The algorithm of reward (instant reward or assigned reward) is introduced later. After the filter process, the adaptive state set for the next epoch is prepared.

3.4.4. Rewards Definition

The skip-stop equilibrium of the stop scheme for each train is used to describe the reward. There are two kinds of rewards: assigned rewards and instant rewards. The assigned reward can be calculated according to the different stop schemes from the adaptive state set. The instant reward is calculated based on the substate set. Let describe whether train t will be scheduled to stop at station z:

To obtain enhanced performance of the skip-stop equilibrium distribution, it is necessary to consider that each train should skip stops along the line and also avoid stopping stations continuously. The reward (assigned reward or instant reward) of the train is expressed aswhere , ; z is the station index in the ascending order; is the average number of skipped stations for train t; and is the maximum number of stops of train t. The assigned reward is defined as the equilibrium index for each epoch.

3.4.5. Transition Probability Definition

Different actions trigger different new states. It is necessary to design the decision process to calculate the state transition probabilities. As in the MDP theory states, the new state achieved after taking an action depends only on the previous state and the decided action. It is not affected by previous states [26].

With regard to the probability distribution of actions and the related stop schemes, the state transition process can be expressed as the transition probability:where is the probability of the state transitioning from the last state to the new state with action a. The state transition process is shown in Figure 4.

3.4.6. Restrained MDP-Based Stop Plan Model

We complete the structure of the model based on the previous steps to find the decision chain with optimal policy that provides the highest future reward. Through this process, all parameters in each epoch are obtained. When the system state transitions from state to state , the decision process is ended.

The future reward can be calculated iteratively using the equilibrium index of each epoch from the actions. Thus, the system future reward can be calculated by the assigned reward of different actions in each epoch. The optimal policy is attributed to the decision chain with the maximum future reward. Denoting the last state as and the current state as ,where is a discount factor. In terms of this model, the relative value iteration algorithm can be used to solve this problem.

4. Experiments

4.1. Instance Dataset

Our train stop planning experiments are based on practical operation data of the Beijing-Shanghai high-speed railway from the Railway Passenger Transport Management Information System from October to December 2017. The rail line has a total length of 1318 km and services 24 high-speed railway stations. Fifty-six representative features were extracted from the collected dataset. Descriptions of the features are listed in Table 1.

To determine the station level in the first stage of the solution framework, service-node features including administrative levels, population, GDP, and distance are considered in the station-level presetting analysis. The values of these features are shown in Table 2; the current level setting by China Railway Company is also listed.

The properties of 115 trains are collected, including the passenger demand, train OD, number of trains, number of stop schemes, total number of stops, and average number of skip stops for trains, which is the average equilibrium index of the trains, as shown in Table 3.

4.2. Clustering Analysis to Update Station Level

To obtain a better station-level input, the fuzzy c-means clustering method introduced in Section 3 is used to reclassify all stations, to embed city features into the station-level parameter.

Three optimal clusters are found, which are plotted in Figure 5(a). It is observed that cluster A is associated with a higher city administrative level, higher GDP, and higher station serviced rate. Cluster B is associated with samples that have a lower city administrative level, GDP, and station serviced rate but higher than cluster C. It can be further inferred that stations in cluster A have the highest level because they have the best feature value. The stations in cluster B have lower levels than stations in cluster A but higher levels than stations in cluster C. This inference can be validated by plotting the average feature values of these clusters, as shown in Figure 5(b).

Cluster C contains several stations without additional details in this step. To further analyse these samples, we rerun the clustering models for cluster C samples only to further explore the station characteristics and station-level structure. The results are shown in Figure 6(a).

As shown in Figure 6, all stations in cluster C are strongly correlated with three features: city administrative level, passenger demand, and station serviced rate. Cluster C-1 is in the highest region for these three features, and cluster C-3 is in the lowest region. It was found that the station level gradually decreased from cluster C-1 to cluster C-3. The results suggest that cluster C can be further divided into three levels. This is further verified by the average feature values in Figure 6(b). Thus, all stations can be finally classified into five levels, as shown in Table 4.

4.3. Train Stop Planning Results and Analysis

With the input railway data and station level, the RMDP-based method is applied to solve the stop plan problem. The optimal train stop plan is displayed in Figure 7. It is observed that Nanjing South is a special station in this railway line because all trains must stop there due to the technical operation constraints. Thus, in this example, the stop rate for the Nanjing South station is set to 1.

Comparing the solution with the original stop plan shown in Table 5, the optimal plan generated by the RMDP model is scheduled to stop at 2% lower than the original plan. Considering that train stops result in operational expense, the proposed optimal method is capable of reducing the total operation cost.

The equilibrium index distributions for all trains in the two stop plans are shown in Figure 8. For simplicity, the average equilibrium indices of the two stop plans were also plotted. It is observed that the trains in the optimal strategy are in a higher region than those in the original plan. Thus, the trains in the optimal strategy with a greater average equilibrium index correspond to better service frequency for each station. This is also an improvement in passenger travel convenience.

With the given passenger flow demand, the equilibrium index of passenger flow can be obtained for both stop plans shown in Figure 9. As shown in Figure 9(a), the change rate of the curve of the original plan is considerably sharper than that of the optimal strategy. It can be estimated by the variance in the passenger flow for both results shown in Figure 9(b). With the new approach, the train capacities are utilised more efficiently in each section. It is beneficial to reduce wasted train capacity and overcrowding. Reducing the wasted train capacity could increase the passenger load factor; reducing overcrowding could improve the sense of comfort. This suggests that the optimal stop plan could provide much higher revenue and a higher quality of service for passengers.

To further analyse the relationship between the number of stops (ST) under the proposed approach and its related features (including population, passenger demand, and GDP), the standard distributions of the features in each station are plotted in Figure 10(a). Because of the fixed value of the city administrative level, the effect of the change in number of stops can be ignored. Hence, we only consider the other features. Figure 10(a) shows that the trend of these features is similar to the trend of the number of stops. However, the passenger demand is closer to the number of stops, which can be verified by Figure 10(b). It is observed that the variance is a minimum between the passenger demand and the number of stops. This result indicates that the optimal strategy is more adaptable to the passenger demand that is beneficial to facilitate revenue from increased efficiency.

5. Conclusion

This study applied clustering analysis and MDP machine learning techniques to analyse the significant features related to the stop plan and proposed a data-driven optimal framework for a train stop plan based on real-world train operational data. Service-node features are adopted as important characteristic station elements. To make the qualitative features more effective, a clustering analysis technique was used to develop a quantitative analysis that can be applied directly to the optimal model. Different average feature values of clusters correspond to different station levels. Accordingly, the stop plan was optimised by continuing epochs that were described with an RMDP model that considered some constraints related to stop planning with the known passenger demand and the per-obtained total number of trains for each OD. A restrained MDP-based stop plan model was proposed to improve the stop plan using the relative value iteration algorithm. A case study was performed on the Beijing-Shanghai high-speed railway line. The computational results revealed that the optimal train stop plan solution is better than the original plan in terms of operation cost control, service quality improvement, and passenger demand adaptiveness. Furthermore, the proposed approach can efficiently solve the stop plan problem with a simpler solution algorithm.

In future research, we will use this approach as the foundation to adjust the stop plan and combine the stop schedule to explore the interaction relationship and the train time tabling problem. We also intend to investigate different machine learning methods, to achieve solution improvements and faster computation.

Data Availability

The data used to support the findings of this study are available from the corresponding author or the first author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Key Research and Development Plan (grant no. 2017YFB1200701) and the National Natural Science Foundation of China (grant no. U1834209).