Abstract

The development of high-speed railways (HSR) in China has attracted a large number of passengers from highway and aviation to railways due to their comfort and high speed. In this case, HSR passenger transportation can improve the operating income by optimizing the ticket allocation. Here, we propose an optimization method of multiclass price railway passenger transport ticket allocation under high passenger demand. First, for the “censored data” problem in the railway passenger demand forecast, we constructed an unconstrained model of railway passenger demand and solved the unconstrained demand through an expectation-maximization algorithm. Then, on this basis, we use gray neural networks (GNNs) to predict the passenger demand of different origins and destinations (ODs), and according to the prediction results, we propose two ticket allocation methods based on operation and capacity control: accurate predivided model and fuzzy predivided model. And we solve this problem by constructing a particle swarm optimization algorithm. Lastly, we use examples to prove that the proposed ticket allocation method can meet the passengers’ needs and have good economic benefits.

1. Introduction

China’s high-speed railway (HSR) is developing rapidly. By the end of 2016, Chinese railway operating miles are about 23000 km, indicating that China HSR network has been initially completed (Figure 1). Figure 2 shows China’s HSR annual growth in mileages.

China HSR has high fixed construction costs and low passenger revenue. Therefore, except for the Beijing-Tianjin intercity, most HSR lines are still at a loss. High-speed rail transportation has the advantages of high speed, punctuality, low environmental impact, low pollution, large transportation volume, and high safety performance. However, the rapid development of highways and air transportation has also transferred part of the HSR passenger demand.

Therefore, HSR companies need to use management methods to optimize the allocation of existing transportation resources, attract more passengers, and increase revenue without increasing costs. At the same time, due to the continuous growth of China’s per capita income, passenger demand has changed from single to diversified. The existing historical data cannot fully reflect the actual demand of passengers, thus forming an unlimited demand forecast.

Cross proposed that revenue management is a method to maximize economic benefits by subdividing the market and dynamically forecasting demand [1]. Unconstrained demand is a process of estimating the distribution parameters of “initial demand” of historical passengers before “spillover” and “reproduction” [2]. When revenue management is applied in the HSR passenger transport market, we can analyze the market demand, identify the different passenger transport market demand, and formulate a reasonable fare grade through the passenger transport market segmentation. At the same time, the unrestrained demand forecast can infer the unrestrained demand data according to the censored data, find out the passengers who cannot order the ideal ticket because of the reservation restriction, and effectively improve the passenger revenue of the railway department and improve the service quality through more accurate demand analysis. Therefore, the introduction of revenue management and unstrained demand forecasting into China’s HSR passenger transport market can achieve the rational allocation of existing high-speed rail transport resources and capacity.

The first part of the paper puts forward the necessity of introducing revenue management and analyzes the feasibility of introducing revenue management in HSR passenger transport from the adaptive perspective and social environment. The second part reviews the literature on unconstrained demand and railway revenue management. The third part establishes the corresponding model according to the demand prediction and capacity control methods of revenue management. The fourth part, with the Wuhan-Guangzhou HSR, proves our method can bring economic benefits to the railway agency. Finally, the conclusions and shortcomings of this paper are put forward.

2. Literature Review

Due to its complexity, demand forecasting occupies most of the resources of the revenue management system [3], but the theoretical research on it has not received the same attention [36]. Demand data censoring is the most complex problem, and ignoring the effect of demand censoring will reduce the accuracy of forecasting [7, 8]. The early unconstrained estimation method is mainly a single-compartment method. Swan [9] proposed the spill model, which laid the foundation for the unconstrained demand estimation model. Richard [10] used the EM method to deal with “censored” data in the field of aviation revenue management. The multicabin approach can effectively avoid “vertical recapture” of different price classes for the same flight, so it is closer to the actual situation. At the same time, in the practice of revenue management, the multidistribution assumption of unconstrained demand is more in line with the demand characteristics in different situations. In addition to the normal distribution and the gamma distribution [11], scholars also proposed the lognormal distribution and the gamma distribution [12], and the multidistribution hypothesis [13].

Since 1999, researchers have shown through analysis that applying revenue management to the railway industry is feasible and of great value [14, 15]. Zhou [16] et al. analyzed the feasibility of the application of revenue management in China's high-speed rail. The authors in [1719] studied the feasibility of applying revenue management to Chinese railways from different perspectives. Yao et al. [20] proved that the use of floating fare strategy can improve the profit of HSR, Wang et al. [21] solved the problem of capacity allocation under the new transport requirements of the multimodal transport system by simulating the seat reservation system, and Valtteri [22] analyzed four differences between air transportation and railway transportation. In 2004, Deutsche Bahn also launched a new plan, PEP [23], which aims to benefit the long-distance passenger transport of Deutsche Bahn. Cinzia [24] proposed an online ticketing model to make ticket refunds and ticket changes common, so it is necessary to use revenue management to predivide the market. Abe [25] introduced and discussed the application of revenue management theory in Japan and Portugal. There are also some studies to analyze the revenue management model of railway passenger transportation and study the timing of passengers purchasing air tickets [26, 27].

The research on railway revenue management methods mainly focuses on three aspects: demand forecasting, differential pricing, and capacity control. In terms of predicting railway passenger flow, there are mainly GA wavelet algorithms [28], logit models [29], and the traditional four-stage method [30]. In terms of differential pricing, there are mainly single fare models and multirange multilevel dynamic fare models [31, 32], pricing strategies based on different populations [33], and dynamic pricing strategies [34]. And Hetrakul and Cirillo [27] use the discrete selection method (DAC) to optimize the model to make the relationship between seat and price more clear. Xiao et al. [35] proposed an allocation model suitable for intercity high-speed trains and made the passenger seat levels flexible and changeable.

Generally speaking, scholars’ research studies on revenue optimization for the railway are mainly focused on basic demand forecasting, differential pricing, and capacity control, especially differential pricing and capacity control. However, there are a few types of research on HSR ticket allocation optimization based on unconstrained demand.

In high passenger demand, the demand for high-speed rail passengers may exceed the capacity limit, but the reservation system will not record the demand exceeding the capacity limit. Therefore, during peak demand, the system records inaccurate demand data, that is, censored data. When dealing with the above problems in practice, there are two main methods: one is to remove the censored data and only consider the data that are not affected by the predetermined restrictions when performing demand distribution and forecasting. Another processing method considers that the demand observed by the system is a real demand, without considering the predetermined limitations of the system, that is, unconstrained processing. Therefore, we propose an HSR ticket allocation optimization method based on the unconstrained processing of censored data to improve the accuracy of prediction.

3. The HSR Ticket Allocation Optimization Method

3.1. Method Procedures

The process of the HSR ticket allocation optimization method has three main stages (Figure 3). In the first stage, the method performs an unconstrained demand forecast on the historical data recorded by the HSR system. The unconstrained method can effectively solve the “censored data” that cannot reflect the real needs of passengers and restore the “actual demand”. In the second stage, we predict the real travel demand of passengers. We use “actual demand” as the input variables for the gray neural network prediction model. In the third stage, we apply the accurate and fuzzy predivided models to allocate tickets on different OD sections of the HSR railway, then compare the results of these two predivided models, and choose the better one as our final result.

3.2. Unconstrained Demand Model

Here, we propose an HSR passenger demand unrestricted estimation method based on the normal distribution form and a demand unrestricted EM algorithm. The specific process is shown in Figure 4. First, we extracted historical demand data from the HSR reservation system and determine whether the data are censored data. For a dataset with censored data, we obtained inaccurate demand data. Then, we apply unconstrained demand estimation on the censored data, and the prediction model parameters are initialized through the positive distribution assumption. Finally, the dataset containing censored data is combined with the dataset without censored data and together these data will be used in the next stage of demand forecasting.

3.2.1. The Model Symbol

We assume that the passenger demand distribution in a certain OD zone and a certain seat category on a certain departure date can be approximated as a normal distribution. Thus we use unconstrained demand estimation based on the passenger demand that follows a normal distribution. Because the EM (expectation-maximization) algorithm is based on the idea of iteratively estimating incomplete data, it is a relatively simple algorithm that can estimate the maximum probability of determining distribution parameters from a dataset containing censored data. The advantage of the EM algorithm is that it can increase the value of the likelihood function through iteration and eventually converges to a relatively stable point. Since the EM algorithm is also solved based on the form of a normal distribution, before using the EM algorithm, we assume that the needs of different price levels are independent of each other. That is to say, if the reservation system for a certain HSR ticket category is closed, the requirements of this category will not be converted to other high-level or low-level requirements.

The meaning of the unconstrained demand model symbol in high-speed rail passenger transport is shown in Table 1.

3.2.2. Model Formulation

Passenger demand in the presale period is approximate to a normal distribution, which means that the demand request is issued according to the normal distribution curve, it follows the normal distribution with the parameter , and its density function is given by

The model building process is as follows:(1)Judge the state of , if , it indicates that the system is in an open state at the observation point , and the system can accept the booking request when the demand comes. On the contrary, if , it indicates that the system is closed at the observation point , and the system does not accept the booking request when there is demand.(2)When the system is in a state , the EM algorithm is used for unconstrained processing. The difference between the demands of two adjacent observation points is the newly added demand in the system:According to whether the system is open or not, there are the following relations:In the relations , the purpose of the EM algorithm estimation is to determine the booking demand number between two adjacent observation points when the system is closed.(3)Parameter initialization: it is using the observed data to derive the initial parameter value that determines the demand distribution. The expression is as follows:In the process of parameter initialization, it will appear that the data in the sample are all “censored” data limited by the booking system; then, it is denoted by . Then, there is the following expression:(4)The calculation process of step E. The calculation process of step E is to replace its true value with its expected value. Assume and it can be substituted into the above equation to standardize its normal distribution:According to the property and theorem of a normal distribution, if is the standardization of , the following formula is always true:Then bring it into the requested expression:The above is the method to calculate the expectation of the passenger booking demand in the EM algorithm. Then, the M step is taken to maximize the expected booking demand. A simple idea for maximization is to find the derivative, which is to take the partial derivative of each of them for an expression with two parameters, and by setting the partial derivative equal to zero, the value of the parameter will be found.(5)The calculation process of step M. Step M is to reestimate the parameters of demand distribution using the original data and the unconstrained demand dataset obtained by estimating parameters in step E.Since it is assumed that the arrival process of passengers is independent of each other, the total probability is the product of the individual probabilities. Therefore, its likelihood function is given byIn order to simplify the calculation process, the likelihood function will be logarithmic and the constant term will be ignored, and then the following can be obtained:The way to maximize is to take the partial derivative of the above likelihood function:

 If the above two expressions are 0, then we can get(6)Determine whether the algorithm accepts. Set an error limit, , if

Then let , repeat steps 4 and 5 to continue the calculation. If the difference between the two means is less than 0.001, then let and , and the algorithm terminates.

3.3. Demand Forecasting Model
3.3.1. Model Formulation

(1)First, establish a gray differential equation and transform it.First, a new sequence is obtained by adding up the original data. Among themLet , among them , then the white differential equation of the response is given byBy solving the above equation, we can getwhere when , we getThen, the time response function is determined as follows:The time response function considers that the existing data conform to the change rule of a certain function, and uses a limited discrete data to fit a function, then forms the change rule of the data, and forecasts the changing trend of the data according to the rule. Assuming that the parameters in the above formula are known, the time response function is taken as the time response function of GNNM (1,1), which is written as follows:In the above model, assume that the number of elements in the original sequence is , then , the following formula can be obtained by transforming the above discrete-time response function:(2)Map the response function that has been transformed into the neural network structure, define the weights between different layers, and determine the activation function between different neurons.

Map the above formula into the neural network and get the GNNM (1,1) structure, which is shown in Figure 5.

In Figure 5, is the input sample value, in which ; is the sample size, is the training times, are the weight values, represents the measured value of the output, is an analog value, and is the output error.

3.3.2. Model Solution

(1)Initialize the parameters in the above model. The initialization results are as follows:The activation function of layers 1, 3, and 4 is taken as a linear function, and the second layer takes the final stable state of the system into account and takes a sigmoid-type function. The corresponding activation functions of different neurons are as follows:Among the above weights, , , and remain unchanged.(2)Use the forward algorithm to calculate input samples, and the calculation results are as follows:Therefore,(3)Use the inverse algorithm to calculate the local gradient and output the neuron:Hidden neurons:(4)Modify the weight value and threshold value, according to the calculation results of the reverse algorithm. is the step size, and the initial step size is , which is continuously reduced during the training process to achieve rapid convergence:(5), repeat the above steps until they converge to the specified error range or complete the specified training times.(6)Enter new samples and repeat the above steps until all samples are trained. Forecast new value .

3.4. O&D Ticket Allocation Model

HSR trains pass through multiple stops during a single operation. The HSR currently implements a three-level fare system, so the control of HSR O&D is a problem of seat inventory control in multiple fare classes between multiple OD sections. Figure 6 shows the operation of a single train through four sections and five stations. Each curve represents a different OD.

3.4.1. Model Assumptions

For the convenience of the construction of the model and the process of subsequent solution, we propose the following assumptions before modeling:(1)All the models are based on the single train ODs, and the interactions between different trains are not considered(2)All tickets are sold at the stopover station of the train, regardless of the effect of the sales restriction(3)Based on the current fare setting method, the discount of fare is not considered(4)Regardless of the situation of refund, or the situation of not getting on the bus after purchasing the ticket(5)The transport capacity of fixed lines is fixed and will not change with time(6)The fares between different ODs are known and will not change with the presale period.

3.4.2. The Accurate Predivided Model

Assuming that there are N sections in a train's operating area; it includes N+1 stopover station, which can meet OD travel demands. The fixed transport capacity of the train is , and represents different sections, then . Each OD pair is denoted by , where . represents three different levels of fares, representing business seats, first-class seats, and second-class seats, respectively. The fare for level k of the specific OD pair is , and the demand for level k of the OD pair is .

The accurate allocation model of OD ticket amounts refers to finding the optimal allocation quantity of different ticket price levels in different OD sections, and the goal is to maximize the revenue of transportation enterprises. If the protection level of the allocated quantity for a certain OD is , then this formula holds:

The above formula indicates that the number of seats assigned to a certain OD should be greater than its protection level, which means the protection level should be determined when considering the social significance of railway transportation, and at the same time, it should be less than the passengers’ demand for this section. The benefits are as follows:

The model is as follows:

3.4.3. The Fuzzy Predivided Model

In the process of controlling the stock of HSR passenger tickets, the maximization of revenue is achieved through the design of restrictions on the number of seats for different OD sections and different fare classes. represents the number of nested booking limits for OD pair of train on class k fares, where , and represents the quantity of distribution for OD pair of train on class k fares. represents the expected revenue under the booking limit, and the following relationship is involved:

Then the total model can be expressed as follows:

3.4.4. Model Solution

Particle swarm optimization is an intelligent optimization algorithm, the principle of which is relatively simple, involving few parameters, and easy to achieve global optimization, so it is a better method to solve integer programming [36]. In this paper, we refer to Li Lihui’s particle swarm algorithm which is designed for the problem of ticket allocation to solve the OD ticket accurate preseparation model and fuzzy preseparation model in this paper [37]. The standard process of particle swarm optimization is shown in Figure 7.

The problem to be solved by the model in this section is the control of seats in OD pairs, and each OD section has ticket price levels, so in particle swarm optimization, it is the vector problem of dimension. In this paper, the specific process of solving the particle swarm optimization algorithm (PSO) is referred to as the solution process of the article which studies on revenue optimization of HSR based on passenger behavior [37]. Set the number of particle swarms as , represents the result of the particle after iterations and represents the current position of the particle. According to the objective function in the model, the fitness function of the K particle can be obtained. In the accuracy preclassification model, the fitness function is as follows:

In the fuzzy preclassification model, the fitness is given by

For the t+1 update of the particle, its velocity is , and the specific formula is given bywhere and are uniformly distributed random numbers, and are learning factors, and set , is the best location of the particle in history, and is the best location for all particles. The location of the particle after iterations is given by

Therefore, the steps to solve the above particle swarm algorithm are as follows:(1)Set the initial location of particle swarm , where , and the initial revenue (2)Calculate the update speed , get new particle location , and then get and (3)If , and satisfy the constraints, , ,and , then go to the next step to judge all particles(4)Update the speed and location of all particles and repeat the above steps until the optimal location is found

4. Case Analysis

4.1. Wuhan-Guangzhou HSR

This section takes the Wuhan-Guangzhou HSR as an example. Wuhan-Guangzhou HSR is the first HSR with long mileage in China (Figure 8), and it was put into operation in 2009. The route spans Hubei, Hunan, and Guangdong provinces, passing 17 stations, and the total length is 1068.6 kilometers. It runs through the three major economic regions of Central and South China, Wuhan Metropolitan Area, Changzhutan Urban Group, and the Pearl River Delta. There are four stations and six OD sections involved in this example. The corresponding names and fares of each section are shown in Table 2, where A represents the passenger traffic of the OD section between WH and CSN stations and other letters B, C, ⋯, F have similar meanings.

4.2. Unconstrained Process and GNN Prediction on Demand

From October to December 2017, we collected passenger demand data from the Wuhan-Guangzhounan HSR trains G1103, G1109, G1117, G1121, G1123, G1133, G1135, and G1143 for 80 days. We use the “114 ticket network” as the specific ticket demand data acquisition source because it provides the most comprehensive HSR ticket booking balance information. The booking data of the G1117 train is selected for the demand forecast. For comparison, we first use the gray neural networks (GNNs) directly by the historic demand data of the first 70 days, and the prediction and verification are performed using the data of the last 10 days. It can be seen that the prediction results of certain days in OD sections are not particularly ideal (Figure 9).

Through the unconstrained processing of the obtained data, we make a comparative analysis of the direct GNN demand forecast results and the unconstrained and GNN demand forecast (UGNN) results. A comparison of these two kinds of prediction results is shown in Figure 10. We also compare the RMSE between the two models in Figure 11. We can see that in all OD sections, the UGNN model performed better than the direct prediction model. According to Figure 11, through the prediction of the unconstrained data, the error of the predicted data is significantly lower than the direct predicted value. It has higher reliability and reduces errors, which can reflect the real travel needs of passengers.

4.3. Ticket Allocation on OD Sections

There are four stations, six OD sections, and 923 second-class seats involved in this example. The ticket prices and demand distribution among the sections are shown in Table 3. Figure 12 shows the distribution of tickets for different sections in a bar chart.

On the basis of these data, the particle swarm optimization algorithm is used to solve the ticket allocation scheme design for different sections of a certain train. Suppose that the number of different particle swarms , six OD sections are divided into six dimensions, the number of iterations per particle is , the number of the evolutions of the particle swarm is set as 300, the fare class k is 1 (only considering second-class seats), and the initial velocity of the particle is 0. We use MATLAB to solve the problem. The results of accurate predivided and fuzzy predivided models are shown in Table 4.

Checked by particle swarm optimization, when the allocated ticket amount of each section of a train is sold out, that is, when the ticket data can no longer accurately reflect the needs of passengers, the unconstrained demand forcast model can better reflect the actual needs of passengers. Table 5 shows the ticket selling result of the three modes based on the ticket allocation method and the real demand obtained through an unconstrained method, respectively, the original, accurate predivided, and the fuzzy predivided.

Figure 12 compares the ticket selling results of original, accurate predivided, and fuzzy predivided models. With the support of the prediction data processed by the unconstrained method, after predivided accurately of different OD tickets, the ticket revenue of the train within the section is 373,734 yuan, and in the case of fuzzy predivided, the revenue is 374,7446 yuan. However, before the unconstrained method, the ticket revenue was 351,775 yuan. This method can increase ticket revenue by up to 6.80%, and it can be concluded that the unconstrained method can effectively improve the ticket revenue of HSR.

China has the largest number of HSR lines in operation, but only a few are profitable. The empirical analysis of the Wuhan-Guangzhou HSR proves that the revenue management method based on unconstrained demand can increase the revenue of HSR and improve the operation. Demand prediction is the basis of revenue management, and the accuracy of prediction directly affects the effect of revenue management, and the improvement of precision plays a positive role in revenue management. In the empirical analysis, the accuracy of data prediction has been effectively improved after the unconstrained method. Therefore, it is necessary to pay enough attention to the unconstrained method.

5. Discussion and Conclusion

In this paper, we propose a method for optimizing the distribution of HSR tickets based on the unconstrained demand model. The biggest feature of this method is that in the case of high passenger demand, historical demand data, that is, “censored data”, cannot fully reflect the real needs of passengers. We can effectively solve this problem through an unconstrained method and restore the “real demand” that the ticketing system failed to record.

By using “real demand” as the input variable of the gray neural network prediction model, the prediction accuracy can be effectively improved, and a more accurate prediction value is the guarantee of a reasonable ticket allocation model. Finally, we verified our optimization method with a practical example of China’s HSR, and the total ticket income obtained by this method is 6.80% higher than the unconstrained model optimization method.

Therefore, we can say that it is feasible to introduce ticket allocation optimization methods into China HSR; second, accurate demand forecasting is essential to optimize HSR passenger income. By constructing an unconstrained model of HSR passenger demand and solving it by the EM algorithm, the accuracy of demand forecasting can be improved.

In our proposed optimization method, the gray neural network combines the advantages of the gray system and the neural network system and can obtain more accurate results when the sample size is small.

We build accurate predivided models and fuzzy predivided models for different OD sections and solve them by designing the particle swarm optimization algorithm. The results show that whether it is a fuzzy predivided model or an accurate predivided model, the results are superior to the original method without unconstrained processing.

Due to the uncertainty of passenger transportation needs, national policies support HSR companies to adjust the number of tickets between different ODs according to demand during the presale period [38]. In addition, with the development of computer hardware technology, big data, and machine learning algorithms [39, 40], the demand forecasting model, differential pricing model, and capacity control model are used in the ticket allocation optimization system to analyze massive ticket data to obtain better results. The demand for ticket allocation plans to meet the needs of passengers has become a reality.

The proposed method here is mainly for HSR transportation with high passenger demand. For ordinary railways, there are also high passenger demands during holidays such as the Spring Festival. Therefore, our method is also applicable to ordinary railways under high demands.

However, our method only considers the unconstrained situation where passenger demand follows a normal distribution. In the future, it can be optimized according to different demand distribution. In addition, this article does not consider differential pricing, but only considers demand forecasting and capacity control during high-speed rail passenger transportation. Therefore, we can continue to study from this point in the future. Finally, we use a static ticket allocation model in the method, and future research may consider incorporating a dynamic ticket allocation model.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Acknowledgments

The research presented in this paper was partially supported by the National Natural Science Foundation of China (U1934216), Research Project of China State Railway Group Co., Ltd. (K2019X021, N2019X002, and K2018X012), Research Project of China Academy of Railway Sciences (2019YJ109), Characteristic Innovation Projects of Ordinary Universities of Guangdong Province (2019KTSCX007), and State Key Lab of Subtropical Building Science, South China University of Technology (2020ZB20).