Research Article  Open Access
Ping Huang, Chao Wen, Qiyuan Peng, Chaozhe Jiang, Yuxiang Yang, Zhuan Fu, "Modeling the Influence of Disturbances in HighSpeed Railway Systems", Journal of Advanced Transportation, vol. 2019, Article ID 8639589, 13 pages, 2019. https://doi.org/10.1155/2019/8639589
Modeling the Influence of Disturbances in HighSpeed Railway Systems
Abstract
Accurately forecasting the influence of disturbances in HighSpeed Railways (HSR) has great significance for improving realtime train dispatching and operation management. In this paper, we show how to use historical train operation records to estimate the influence of highspeed train disturbances (HSTD), including the number of affected trains (NAT) and total delayed time (TDT), considering the timetable and disturbance characteristics. We first extracted data about the disturbances and their affected train groups from historical train operation records of WuhanGuangzhou (WG) HSR in China. Then, in order to recognize the concatenations and differences of disturbances, we used a KMeans clustering algorithm to classify them into four categories. Next, parametric and nonparametric density estimation approaches were applied to fit the distributions of NAT and TDT of each clustered category, and the goodnessoffit testing results showed that Lognormal and Gamma distribution probability densities are the best functions to approximate the distribution of NAT and TDT of different disturbance clusters. Specifically, the validation results show that the proposed models accurately revealed the characteristics of HSTD and that these models can be used in realtime dispatch to predict the NAT and TDT, once the basic features of disturbances are known.
1. Introduction
An operating train may encounter various unexpected disturbances such as bad weather, power outage, facility failures, and so on [1], which can lead to considerable losses for both railway managers and travelers. For example, in the Dutch railway network, statistics show that there are approximately 22 infrastructurerelated delays per day, lasting on average for 1.7 hours [2]. According to the statistics of the China Railway Corporation, the average departure punctuality in origin stations for China’s 23,000km highspeed rail (HSR) network was as high as 98.8% in 2016. However, due to various disturbances during the trip, the average punctuality at final destination stations was less than 90%, even though delays smaller than five minutes are considered punctual [3].
When disturbances occur, dispatchers need to anticipate the potential influences of a specific delay. They need to estimate the number of affected trains (NAT) and total delayed time (TDT) before rescheduling the timetable. Modeling the highspeed train disturbances (HSTD) will be helpful and of great significance, although it is extremely challenging due to the following two aspects:(1)Various influencing factors. The influence of railway disturbances is related to various factors, for example, timetable structure, facility conditions, and experience and preference of dispatchers which makes it difficult for these factors to be interpreted through functional relationships.(2)Complex train interactions. Due to resources occupation conflicts and the continuity of train operation, trains are interactive, which makes mathematical model incapable of modeling train interaction.
In practice, some skilled dispatchers usually predict HSTD empirically, which leads to differences in dispatching even for the same dispatcher when working in different situations. Datadriven approaches have recently gained more attention due to their better understanding of train delay concatenation and the fact they are more supportive of robust timetables and realtime dispatching [4]. In addition to the availability of train operation records, advanced datamining techniques enable us to address these problems from a dataanalysis perspective. Train operation records are therefore assumed to be the interactive consequences of all influencing factors. Therefore, mining train operation records provides us with a brandnew way of examining train interactions arising from heterogeneous factors.
To bridge the gap between the empirical and mathematical models, this paper aims to establish datadriven models of the NAT and TDT caused by different types of disturbances using the train operation data of WuhanGuangzhou (WG) HSR in China. To this end, we used a KMeans algorithm to categorize the extracted disturbances according to their influencing factors. Next, we applied five widely used probability density models and two kernel functions to fit the distributions of NAT and TDT. We then selected the best models for each cluster based on goodnessoffit testing. Finally, the test data from the operation records from nine months were used to validate the generalization of the fitted models, which showed that these models could be applied to estimate the NAT and TDT of future disturbances.
2. Literature Review
Since the 1970s, there has been active research on disturbance management in train dispatching [5]. Most recently, the topic of the INFORMS 2018 Railroad Problem Solving Competition was “Predicting NearTerm Train Schedule Performance and Delay” using operational records. A large number of methods and algorithms have been proposed to improve rail operations, but due to unavailable or insufficient historical data, research was mainly based on simulation and mathematical delay propagation models.
Many approaches have been uncovered and proposed to manage railway disturbances. Exogenous factors, such as natural disasters, and bad weather conditions, and endogenous factors, such as operation interference resulting from equipment failure, manmade faults, railway construction, temporary speed limitations, defective braking systems, signal and interlocking failures, and excessive passenger demand can contribute, alone or together, to the primary delay [6–8]. Also, if the running and dwell times increase due to unexpected disturbances, it can result in knockon delays and delays for other trains [9]. Serious disruptions such as switch or signal failures, if not managed effectively, can result in queuing of trains creating a chain of delayed trains. The experience from the Taiwan Highspeed Rail shows that shortening the maintenance cycle can effectively alleviate the problem of train delay caused by signal failures [10]. Some studies have made contributions on statistical models of delay and the respective fitness models. The Weibull, Gamma, and Lognormal distributions have been adopted in several studies [11, 12]. It was shown that the distributional form of primary delays, and the affected number of trains could be wellapproximated by classical methods such as Lognormal distribution and linear regression models [13]. A qexponential function is used to demonstrate the distribution of train delays on the British railway network [14]. Using spatial and temporal resolution transport data from the UK road and rail networks and the intense storms of 28 June 2012 as a case study, a novel exploration of the impacts of an extreme event has been carried out in [7]. Given the HSR operation data, the maximum likelihood estimation method was used to determine the probability distribution of the different disturbances factors and the distributions of affected trains; however, the models of primary delay consequences have not been established in detail [15]. Probabilistic distribution functions of both train arrival and departure delays at the individual station were derived in general based on the data from BeijingShanghai HSR [16].
Datadriven research studies proposed for delay/disruption management mainly focused on using regression or distribution approaches to fit delay data. Van der Meer et al. mined data from peak hours, including rolling stock, and weather data and developed a predictive model involving the mining of track occupation data for delay estimations [9]. A datamining approach was used for analyzing rail transport delay chains, with data from passenger train traffic on the Finnish rail network, but the data from the train running process was limited to one month [4]. Murali et al. reported a delay regressionbased estimation technique that models delay as a function of train mix and network topology [17]. A statistical analysis of train delays in the Eindhoven Station in the Netherlands was used to explain systematic delay propagation based on the use of a robust linear regression model to uncover the correlations between arrival delays [18]. Recently, Kecman and Goverde developed separate predictive models for the estimation of running and dwell times by collecting data on the respective process types from a training set [19]. Javad et al. examined different distribution models for running times of individual sections in an HSR system and showed that the loglogistic probability density function is the best distributional form to approximate the empirical distribution of running times on the specified line [3]. A hybrid Bayesian network model is also established to predict arrival and departure delays for WuhanGuangzhou HSR [20].
A review of the literature reveals that only a few studies focus on the modeling of the NAT and TDT of disturbances, especially in HSR operations. It is crucial for HSR operating companies to predict and reduce disruptions in train operations and to operate as closely as possible to published timetables. It is therefore important that they identify the severity of displacements or interruptions to train services. This can help dispatchers reduce delay propagation and the possibility of aggravation through effective designing of timetables or realtime dispatching decisions. This study aims to fill this gap by conducting a detailed factorspecific analysis of delays based on empirical data from WG HSR.
3. Preliminaries
3.1. Data Description
The WG HSR, which has a length of 1,096 km, is one of the busiest passenger railway lines in the country and joins with the GuangzhouShenzhen, HengyangLiuzhou, and ShanghaiKunming HSRs at the GZS, HYE, and CSS stations, respectively. Trains operating on this line are all equipped with the Chinese Train Control System (CTCS), which allows a maximum speed of 350 km/h, and an Automatic Train Supervision system, which records the movements of all trains. The WG HSR line, where only highspeed trains run on, is totally separated from conventional lines, but it connects with ShanghaiKunming HSR line at CSS station, HengyangLiuzhou HSR line at HYE station, and GuangzhouShenzhen and GuiyangGuangzhou HSR lines at GZN station. At these stations, train termination, turnover, and crossline are allowed. We studied the movement of trains in the northbound direction on this line, that is, from GZS to CSS, as shown in Figure 1.
The data collected include 57,796 trains in the GZSHYE section and 64,547 HSR trains in the HYECSS section. The data contain operational records covering the period from March 24, 2015 to November 10, 2016, comprising scheduled/actual arrival/departure records for each train at each station, train numbers, dates, and information on occupied tracks. All the data with a time format was recorded in full minutes due to the accuracy of the system, as shown in Table 1.
 
The passage tracks at the station are labeled with Roman characters, while the dwelling tracks are labeled with numbers. 
In order to better understand the distributional pattern of the influence of disturbances, we first analyzed the train delay regularity by visualizing the arrival and departure delay distribution at HSW station. The histograms in Figure 2 clearly show that both arrival and departure delays follow a heavytailed distribution, from which we can infer that the influence of disturbance (NAT and TDT), like other problems in complex systems, has a rightskewed and heavytailed distribution.
3.2. Problem Description
In both practice and research, train delays and disturbances are always classified according to their causes. However, this method seems to have a drawback, as some disturbances with different causes sometimes have the same influencing mechanism on train operation. For example, track failure in sections and power supply fault in sections are different causes, but they have the same effect on train operation, because in both conditions, trains have to wait for the availability of the section. In other words, from the perspective of railway management, it is significant to classify disturbances according to their impacts on train operation. Besides, other cases like signal fault and turnout fault, speed limitation for bad weather and speed limitation for construction, they both have the same effect on train operation. In this research, we intend to classify the disturbances focusing on some key factors that influence their effects on train operation.
4. Clustering Model for Disturbances
4.1. Influencing Factors
In order to meet passenger demand, train services tend to vary across different periods and segments even on the same HSR line. Train interval is a key factor in delay propagation, as a disturbance tends to cause more severe effects in smaller interval periods and segments, and smaller effects in larger interval periods and segments. Figures 3(a) and 3(b) reveal the NAT and TDT of disturbances at GZN and HSW. It clearly shows that the NAT and TDT of WG HSR differ significantly across time periods and segments. In addition, the disturbance length will directly influence their consequences; the longer the disturbance, the more delayed trains and total delays it will cause. Based on the analysis, the following features for clustering models and indexes to measure the influence of disturbances were ascertained:(i)Train interval (I): the average scheduled train interval when a disturbance occurs (minute);(ii)Occurrence time (T): the time when a disturbance starts (in the railway operation, it can be indicated by the scheduled arrival time of the first delayed train in its affected trains group);(iii)Disturbances length (L): the time span from starting time to ending time of the disturbance (it can be indicated by the delay times of the first train in the affected train group (minute).(iv)NAT and TDT indexes are used to measure their influence.
(a) Spatialtemporal distribution of NAT
(b) Spatialtemporal distribution of TDT
Figure 4, which shows a real disturbance in YDWSG section on WG HSR line, demonstrates the calculating methods of the selected features. If seven trains were delayed due to this disturbance (the first seven trains were delayed but the eighth train was on schedule when they arrived at SG station), we thus define these seven trains as affected train group. In this case, NAT value is thus seven, TDT value is the sum of arrival delays of these seven trains at SG station, and the interval (I) is the average interval of these seven trains at SG station. The disturbance length (L) which is defined as the difference of its occurrence time and the time when this section is available in this research was calculated by the delay times of the first train in the affected train group.
Based on the indexes, data on 6006 disturbances and their consequences on train operations were extracted from the raw data; five cases are shown in Table 2. In order to validate the proposed model, the extracted data were split into a training dataset, which included 3154 disturbances in the preceding 12 months, and a validation dataset, which contained 2852 disturbances over the following 9 months. We extracted only those disturbances with a disturbance longer than 4 minutes according to the standard set by the Chinese Railway Company. Shorter disturbances, which can be assimilated by the time supplements distributed in timetables, tend not to cause delay propagation and are therefore eliminated from the dataset.

4.2. KMeans Clustering
A KMeans cluster is a typical and popular algorithm that has strong robustness on highdimensional and multicollinear datasets in unsupervised learning. For the given dataset , assuming that the clustering centers are initialized, the object of KMeans is to minimize the mean squared error (MSE):where is the mean vector of C_{k}. Equation (1) indicates the nearness between samples in a cluster and their mean vector.
The core principle of the KMeans cluster is to choose K points in space as centers and assign the samples to their nearest cluster. By iteratively updating their centers, the objective is to minimize until the stopping condition is satisfied. The KMeans clustering algorithm can be concluded as shown in Algorithm 1 and more details are shown in literature [21].

4.3. Model Performance
The performance of clustering models is commonly evaluated from two perspectives: the tightness of the samples in the cluster and the distances between clusters. The widely used evaluation indexes are distance and covariance [22, 23], both of which require smaller values among samples in the same cluster and larger values among samples in different clusters. To systematically evaluate the clustering models, we simultaneously chose distance and covariancebased indexes, namely, the Silhouette Coefficient (SC) and CalinskiHarabasz Score (CHS) as shown in (2) and (3), respectively.where is the average distance between and other samples in a cluster; is the average distance between and samples in other clusters; notifying: the range of SC is .where is the number of samples, K is the number of clusters, and B_{K} is the covariance matrix among samples in the same cluster, while W_{K} is the covariance matrix among samples in different clusters, and tr(g) is the trace of the matrix. According to (2) and (3), better clustering results require larger SC and CHS.
To obtain reasonable clustering results, we also applied other clustering algorithms including the BIRTH, Spectral, and Agglomerative clustering models and investigated the number of clusters (K) of each model from two to twenty in order to categorize the disturbances. After standardization of the input data, clustering was performed, and the clustering result of each model is shown in Figure 5. We can first choose KMeans model as the best clustering model from the candidates, as it almost had the best performances on all different number of clusters on both SC and CHS metrics. An exception was when K equals two, which resulted in the highest SC index but the lowest CHS index of BIRCH model. Considering that clustering models should be evaluated from both distances and covariancebased indexes to obtain systematical results, we thus chose KMeans model, which has both high SC and CHS metrics, as the clustering model for railway disturbances. Then, when we chose the K value, we found that three, four, and five clusters were all possible for KMeans algorithm. We, therefore, standardized the SC and CHS values of KMeans model between zero and one, and finally chose four as the best number of clusters for it resulted in the highest sum of standardized SC and CHS. Therefore, a KMeans algorithm that has four clusters was selected as the clustering model for HSR disturbances.
(a) SC of clustering models
(b) CHS of clustering models
Finally, HSR disturbances were classified into four categories using the KMeans clustering algorithm, as shown in Figure 6. According to the distribution of each cluster, we can define them as follows:(i)Cluster A: disturbances occurred between 7:00 am and 23:30 pm; train intervals range from 3 to 13 minutes; lengths range from 5 to 14 minutes.(ii)Cluster B: disturbances occurred between 7:30 am and 23:30 pm; train intervals range from 5 to 25 minutes; lengths range from 14 to 31 minutes.(iii)Cluster C: disturbances occurred between 7:30 am and 23:30 pm; train intervals range from 4 to 30 minutes; lengths are longer than 31 minutes.(iv)Cluster D: disturbances occurred between 10:30 am and 23:30 pm; train intervals range from 13 to 30 minutes; lengths range from 5 to 22 minutes.
The statistics of NAT and TDT of each cluster are shown in Table 3.

5. Estimating Models of NAT and TDT
5.1. Candidate Models
In order to reveal the rules of disturbance influences, we first investigated the histogram of NAT and TDT, as shown in Figures 7 and 8, which indicates that both NAT and TDT appear to have rightskewed distributions. Since the locations and shapes of the histograms are very different, we fitted the data using five common rightskewed distribution models and two kernel functions as the candidate models. The distribution models including the Lognormal, Weibull, Gamma, Exponential, and Logistic and kernel functions, including Gaussian and Epanechnikov kernels were employed to fit the data from parametric and nonparametric perspectives. The probability distribution models and kernel functions of these models are as follows [24].
(i) LogNormal Distributionwhere σ is the shape parameter and μ is the location parameter.
(ii) Weibull Distributionwhere λ>0 is the scale parameter and k>0 is the shape parameter.
(iii) Gamma Distributionwhere α is the shape parameter and β is the scale parameter.
(iv) Exponential Distributionwhere λ is the shape parameter.
(v) Logistic DistributionWhere u is the location parameter and s is the scale parameter.
(vi) Gaussian Kernel
(vii) Epanechnikov Kernel
The parameters were estimated using maximum likelihood and are shown in Table 4, and the fitting results of the distribution models are shown in Figures 7 and 8. These figures clearly show that all the candidate probability density models mimic the shape of the nonparametric estimation using Gaussian and Epanechnikov kernels, which enables us to choose the best model from the parametric candidates as the estimating model of disturbance influences.

5.2. GoodnessofFit Test
In this section, we evaluated the goodnessoffit of the distribution models using the KolmogorovSmirnov (KS) method [25] and selected the optimal distribution model for each category.
KS testing is proposed to test whether a group of data follows a theoretical distribution model; its null hypothesis is that the dataset follows a theoretical distribution. Its testing statistics are the largest difference between the cumulative distribution function of the data and the theoretical distribution, as shown in (12). Because the number of trains is the integer, and the historical train operation data were recorded in the minimum unit of one minute, we inserted some random numbers that follow uniform distribution in order to meet the continuity requirement of the KS method.where is the cumulative distribution function of the samples, which are the NAT and TDT; is the cumulative distribution function of the theoretical distribution models, which are the five alternative distribution models. We chose the significance level ; when the sample size is large enough (over 50), the critical value of D should bewhere n is the sample size.
According to the rules of the KS test, if D<D_{0.05}, the null hypothesis is accepted and the samples are considered as the following theoretical distribution. The smaller D is, the closer the sample is to the theoretical distribution. The KS testing results of all the models are shown in Table 5.
 
Superscript “” indicates the best models. 
Finally, we chose the model that passes the KS test and has the smallest D as the distribution model of each disturbance cluster. The parameters of distribution models for NAT and TDT of each cluster are shown in Table 6.

5.3. Generalization Test
In order to investigate the generalization of the fitted distribution models, we used the disturbances in the following nine months of WG HSR line to validate the models. We first fed the disturbances into the proposed KMeans clustering algorithm and obtained the clustering labels of each sample. The clusters that have the same labels as the training dataset are used to validate the fitted models. The fitting results are shown in Figures 9 and 10, and the descriptive statistics and their KS testing results are shown in Table 7.

The testing results indicate that all the fitted models pass the goodnessoffit and generalization testing. The fitted models can therefore accurately reveal the distributive disciplines of NAT and TDT and are of great significance to realtime train dispatching.
6. Conclusion
This paper presents a datamining approach, which is composed of the KMeans clustering model and probability density fitting techniques to estimate NAT and TDT resulting from HSR operation disturbances, considering timetable and disturbance characteristics. The proposed clustering algorithm, which fully takes the disturbance characteristics and timetable structure into consideration, compensates the shortage of existing classification method of railway disturbance. In realtime operation, once a disturbance happens, its features are therefore known (including train interval, length and occurrence time of disturbance), which can be fed into the trained KMeans clustering algorithm, and its classifying label can be obtained. Then, according to the label, the specific distribution model can be chosen to calculate the probabilities of different outcomes. The probability models being fitted using real operation data can help dispatchers improve their decisionmaking qualities. With the probability distribution models, the dispatchers can obtain the realtime and future probabilities of NAT and TDT of any train operation disturbances. The influencing patterns of disturbances arising from the train operation data can be used to improve the rescheduling and adjusting abilities in disturbance situations and help dispatchers improve their decisionmaking qualities, by providing dispatchers with accurate estimation of NAT and TDT. Also, these probability models can improve the train operation and disturbance management in simulation systems, as they are more accurate than those hypothetical models that bring certain gaps between simulations and practice and usually over assume and ignore some situations and constraints of train operations.
The established distribution models are general models for WG and XS HSR lines. However, train services and infrastructure could both vary at different stations, leading to differences in distribution model parameters. Therefore, our future work will focus on establishing models for each station on the HSR network.
Data Availability
The data used to support the findings of this study are restricted by the China Railway Corporation. Data are available from Ping Huang, huangping129@my.swjtu.edu.cn, only for researchers who use them to review or better understand this paper.
Conflicts of Interest
No potential conflicts of interest were reported by the authors.
Acknowledgments
This work was supported by the National Nature Science Foundation of China [Grant no. 71871188]; the Science & Technology Department of Sichuan Province [Grant no. 2018JY0567]; and Doctoral Innovation Fund Program of Southwest Jiaotong University. We are grateful for the contributions made by our project partners.
References
 H. Khadilkar, “Dataenabled stochastic modeling for evaluating schedule robustness of railway networks,” Transportation Science, vol. 51, no. 4, pp. 1161–1176, 2017. View at: Publisher Site  Google Scholar
 J. JespersenGroth, D. Potthoff, J. Clausen et al., “Disruption management in passenger railway transportation,” in Robust and Online LargeScale Optimization, vol. 5868 of Lecture Notes in Computer Science, pp. 399–421, Springer, Berlin, Heidelberg, Germany, 2009. View at: Publisher Site  Google Scholar
 J. Lessan, L. Fu, C. Wen, P. Huang, and C. Jiang, “Stochastic model of train running time and arrival delay: a case study of wuhan–guangzhou highspeed rail,” Transportation Research Record, 2018. View at: Google Scholar
 J. Wallander and M. Mäkitalo, “Data mining in rail transport delay chain analysis,” International Journal of Shipping and Transport Logistics, vol. 4, no. 3, pp. 269–285, 2012. View at: Publisher Site  Google Scholar
 L. E. Peppard and V. Gourishankar, “Optimal control of a string of moving vehicles,” IEEE Transactions on Automatic Control, vol. 15, no. 3, pp. 386387, 1970. View at: Publisher Site  Google Scholar
 N. O. E. Olsson and H. Haugland, “Influencing factors on train punctuality  Results from some Norwegian studies,” Transport Policy, vol. 11, no. 4, pp. 387–397, 2004. View at: Publisher Site  Google Scholar
 M. Hartrumpf, T. Claus, M. Erb, and J. M. Albes, “Surgeon performance index: tool for assessment of individual surgical quality in total quality management,” European Journal of CardioThoracic Surgery, vol. 35, no. 5, pp. 751–758, 2009. View at: Publisher Site  Google Scholar
 A. Higgins, E. Kozan, and L. Ferreira, “Modelling delay risks associated with train schedules,” Transportation Planning and Technology, vol. 19, no. 2, pp. 89–108, 1995. View at: Publisher Site  Google Scholar
 S. Milinković, M. Marković, S. Vesković, M. Ivić, and N. Pavlović, “A fuzzy Petri net model to estimate train delays,” Simulation Modelling Practice and Theory, vol. 33, pp. 144–157, 2013. View at: Publisher Site  Google Scholar
 N. Hasan, “Direct fixation fastener (DFF) spacing and stiffness design,” in Proceedings of the 2011 Joint Rail Conference, JRC 2011, pp. 11–17, USA, March 2011. View at: Google Scholar
 J. Yuan, R. Goverde, and I. Hansen, “Propagation of train delays in stations,” WIT Transactions on the Built Environment, vol. 61, 2002. View at: Google Scholar
 A. Higgins and E. Kozan, “Modeling train delays in urban networks,” Transportation Science, vol. 32, no. 4, pp. 346–357, 1998. View at: Publisher Site  Google Scholar
 C. Wen, Z. Li, J. Lessan, L. Fu, P. Huang, and C. Jiang, “Statistical investigation on train primary delay based on real records: evidence from Wuhan–Guangzhou HSR,” International Journal of Rail Transportation, vol. 5, no. 3, pp. 170–189, 2017. View at: Publisher Site  Google Scholar
 T. Takimoto, “Development of efficient operational control using object representation,” Computers in Railways Vii, vol. 7, pp. 837–841, 2000. View at: Google Scholar
 P. Xu, F. Corman, and Q. Peng, “Analyzing railway disruptions and their impact on delayed traffic in Chinese highspeed railway,” IFACPapersOnLine, vol. 49, no. 3, pp. 84–89, 2016. View at: Publisher Site  Google Scholar
 L. Zhang, J. Liu, R. Wu, and X. Gong, “Design of performance testing system for train air conditioning,” in Proceedings of the 2009 International Conference on Energy and Environment Technology, ICEET 2009, vol. 1, pp. 85–89, China, October 2009. View at: Google Scholar
 P. Murali, M. Dessouky, F. Ordóñez, and K. Palmer, “A delay estimation technique for single and doubletrack railroads,” Transportation Research Part E: Logistics and Transportation Review, vol. 46, no. 4, pp. 483–495, 2010. View at: Publisher Site  Google Scholar
 R. M. Goverde, “Punctuality of railway operations and timetable stability analysis,” The National Academics of Sciences Engineering Medicine, 2005. View at: Google Scholar
 P. Kecman and R. M. P. Goverde, “Predictive modelling of running and dwell times in railway traffic,” Public Transport, vol. 7, no. 3, pp. 295–319, 2015. View at: Publisher Site  Google Scholar
 J. Lessan, L. Fu, and C. Wen, “A hybrid Bayesian network model for predicting delays in train operations,” Computers & Industrial Engineering, 2018. View at: Google Scholar
 J. A. Hartigan and M. A. Wong, “Algorithm AS 136: a kmeans clustering algorithm,” Journal of the Royal Statistical Society. Series C (Applied Statistics), vol. 28, no. 1, pp. 100–108, 1979. View at: Google Scholar
 F. Pedregosa, G. Varoquaux, and A. Gramfort, “Scikitlearn: machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011. View at: Google Scholar  MathSciNet
 Y. Liu, Z. Li, H. Xiong, X. Gao, and J. Wu, “Understanding of internal clustering validation measures,” in Proceedings of the 10th IEEE International Conference on Data Mining, ICDM 2010, pp. 911–916, December 2010. View at: Publisher Site  Google Scholar
 S. M. Ross, Introduction to Probability Models, Elsevier/Academic Press, Amsterdam, Eleventh edition, 2014. View at: MathSciNet
 F. J. Massey, “The KolmogorovSmirnov test for goodness of fit,” Journal of the American Statistical Association, vol. 46, no. 253, pp. 68–78, 1951. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2019 Ping Huang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.