Abstract

Passenger flow forecast is of essential importance to the organization of railway transportation and is one of the most important basics for the decision-making on transportation pattern and train operation planning. Passenger flow of high-speed railway features the quasi-periodic variations in a short time and complex nonlinear fluctuation because of existence of many influencing factors. In this study, a fuzzy temporal logic based passenger flow forecast model (FTLPFFM) is presented based on fuzzy logic relationship recognition techniques that predicts the short-term passenger flow for high-speed railway, and the forecast accuracy is also significantly improved. An applied case that uses the real-world data illustrates the precision and accuracy of FTLPFFM. For this applied case, the proposed model performs better than the k-nearest neighbor (KNN) and autoregressive integrated moving average (ARIMA) models.

1. Introduction

High-speed railway as a kind of large volume passenger transportation mode has been well developed in Europe and Japan and has been developing in China in an even larger scale and has been planned to develop in American continent. In these areas, high-speed railway plays the role of backbone of passenger transportation systems. How to raise operation of the efficiency and how to make the passenger service decision-making more demand-responsive have been the most important focus to the research concerned. As one of the most important basics for the decision-making on high-speed railway transportation pattern and train operation planning, passenger flow forecast is of essential importance, and short-term passenger flow forecast is the key to the success of daily operation management.

Recently, many forecast techniques have been used to solve the prediction problems. Lin and Yang applied the grey forecasting model to forecast the output value of Taiwan’s optoelectronics industry accurately from 2000 to 2005 [1]. In [2], four models were developed and tested for the freeway traffic flow forecasting problem. They were the historical average, time-series, neural network, and nonparametric regression models. The nonparametric regression model significantly outperformed the other models. Du and Ren [3] proposed a prediction model of train passenger flow volume to help the railway administration’s analysis of running strategies. The model was analysed based on industrial economic indexes and Cobb-Douglas theory to make the prediction. Particularly, ARIMA model has become one of the most common approaches of parametric forecast since the 1970s. The ARIMA model is a linear combination of time-lagged variables and error terms, which has been widely applied in forecasting short-term traffic data such as traffic flow, travel time, and speed. In [4], time series of traffic flow data are characterized by definite periodic cycles. Seasonal autoregressive integrated moving average (ARIMA) and Winters exponential smoothing models were developed. In [5], it was presented that the theoretical basis for modeling univariate traffic condition data streams as seasonal ARIMA process. In [6], Hamed et al. attempted to develop time-series models for forecasting traffic volume in urban arterials, and the Box-Jenkins ARIMA model turned out to be the most adequate model in reproducing all original time series. As stated by Brooks, ARIMA performed well and robustly in modeling linear and stationary time series [7]. However, the applications of ARIMA models were limited because they assumed linear relationships among time-lagged variables and they could not capture the structure of nonlinear relationships [8].

The nonparametric regression models have been applied to forecast transportation demand. However, among these nonparametric techniques, KNN method has been rarely adopted in forecast transportation demand. Robinson and Polak proposed the use of the KNN technique to estimate urban link travel time with single loop inductive loop detector data, and the optimized KNN model was found to provide more accurate estimates than other urban link travel time methods [9].

Neural network model has been frequently adopted to predict. In [10], the time-delay recurrent neural network for temporal correlations and prediction and multiple recurrent neural networks were described. And the best performance is attained by the time-delay recurrent neural network. In [11], a hybrid EMD-BPN forecast approach which combined empirical mode decomposition (EMD) and backpropagation neural networks (BPN) was developed to predict the short-term passenger flow in metro systems. In [12], the forecast model of railway short-term passenger flow based on BP neural network was established based on analyzing the principle of BP neural network and time sequence characteristics of railway passenger flow. In [13], a neural network model was introduced that combines the prediction from single neural network predictors according to an adaptive and heuristic credit assignment algorithm based on the theory of conditional probability and Bayes’ rule. In [14], Chen and Grant-Muller reported the application and performance of an alternative neural computing algorithm which involves “sequential or dynamic learning” of the traffic flow process. This indicated the potential suitability of dynamic neural networks with traffic flow data. In [15], Li and Chong-Xin employed chaos theory into forecasting. Delay time and embedding dimension are calculated to reconstruct the phase space and determine the structure of artificial neural network, and the load data of Shanxi province power grid of China is used to show that the model is more effective than classical standard BP neural network model.

Support vector machine technique has also been adopted in forecast. In [16], a modified version of a pattern recognition technique known as support vector machine for regression to forecast the annual average daily traffic was presented. Hu et al. utilized the theory and method of support vector machine regression and established the regressive model based on the least square support vector machine. Through predicting passenger flow on Hangzhou highway in 2000–2008, the authors showed that the regressive model of the least square support vector machine had much higher accuracy and reliability of prediction [17].

Since the problem was introduced, high-speed railway passenger flow forecast is vitally important to the organization of high-speed railway. However, several studies have focused on forecasting short-term high-speed railway passenger flow on the basis of the regularity and randomness of the passenger flow rate. A new method is, therefore, very much needed. Fuzzy temporal logic based passenger flow forecast model (FTLPFFM) is proposed in this paper. Quasi-periodic variation of high-speed railway passenger flow is sufficiently reflected and nonlinear fluctuation of high-speed railway passenger flow is processed using fuzzy logic relationship recognition techniques in the searching process. The proposed model has explicit physical meaning, which reflects variation of high-speed railway passenger flow and has sufficient comprehensibility and interpretability. The characteristics of short-term high-speed railway passenger flow are vitally important to forecast model which is used to improve predictive performance of fuzzy -nearest neighbor by comparing with other predictive methods in short-term high-speed railway passenger flow forecast.

The remainder of this paper is organized as follows. In Section 2, passenger flow characteristics of the high-speed railway and passenger flow variation in adjacent period are summarized. In Section 3, the change degree of passenger flow is divided into eight grades according to cognitive habit and passenger flow change rate is fuzzified. FTLPFFM is proposed in Section 4. In Section 5, the experiment result for the application of FTLPFFM is compared with ARIMA and KNN models when using three statistics: mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean square error (RMSE). And FTLPFFM appears to be more robust and universally fitting. The last section is the conclusion and future work.

2. Passenger Flow Feature Extraction

In short-term passenger flow forecast, the characteristics of high-speed railway passenger flow are summarized based on time variable because passenger flow has strong correlation to time variable. The data of high-speed railway passenger flow were collected from Beijingnan Railway Station to Jinanxi Railway Station, which is passenger flow in per hour from 26 March to 4 April 2012 (see Figure 1) and daily passenger flow from 14 May to 31 July 2012 (see Figure 2).

Two characteristics of high-speed railway passenger flow are taken into account in FTLPFFM. The first significant characteristic is quasi-periodic which imposes a great impact on passenger flow forecast. The running time of high-speed train is between 6:00 and 24:00 and the passenger flow in morning peak and evening peak is more than other periods, which is revealed in Figure 1. Also, the high-speed railway passenger flow is usually stable from Saturday to Wednesday, increases on Thursday, and reaches the peak on Friday, which is revealed in Figure 2. Therefore, the fluctuation cycle of high-speed railway passenger flow is one day and one week. The second one is nonlinear fluctuation which also imposes a great impact on passenger flow forecast. Specifically, the change rate of passenger flow is instable with nonlinear fluctuation for a short time because of many effect factors, such as passengers’ income, travel cost, and service quality of transportation, which is revealed in Figures 1 and 2.

3. Regularity of Passenger Flow

Notation:: the passenger flow in period ,: the total number of points of the historical passenger flow series,: the current passenger flow state,: the passenger flow change rate from to ,: the interval of passenger flow change rate,: the intermediate value of ,  .The history passenger flow series is denoted by . The passenger flow change rates between adjacent periods are taken into account, and then the passenger flow change rates are analyzed and variation of passenger flow in adjacent period is summed up.

3.1. Change Rate of Passenger Flow

In order to express passenger flow trend in adjacent period clearly and more accurately, passenger flow change rate is normalized.

Define standardized passenger flow change rate , and . For , the passenger flow descends from period to ; for , the passenger flow increases from period to ; for , the passenger flow does not change from period to .

In Table 1, the data are collected from Beijingnan Railway Station to Jinanxi Railway Station in Beijing-Shanghai high-speed railway. For example, the maximum value of the passenger flow change in adjacent periods is calculated as ; the passenger flow change rate from 8:00–8:30 to 8:30–9:00 on October 10th is calculated as . Similarly, we can calculate the passenger flow change rates, which are 0.231, 0.5158, −0.8145, and so forth, as shown in Table 1.

3.2. Variation of Passenger Flow

In order to reveal the regularity of the passenger flow trend clearly and express varying degrees of passenger flow change, respectively, we divide passenger flow change rate into eight intervals applying Zadeh’s fuzzy set theory [18].

Define the universe of discourse and partition it into equal length intervals , , , , , , , and . The midpoints of these intervals are , , , , , , , and . Define fuzzy set based on the redivided intervals; fuzzy set denotes a linguistic value of the passenger flow represented by a fuzzy set, .

The notations , , , and denote that passenger flow decrease is too large, larger, microlarge, and less, respectively. Also, the notations , , , and denote that passenger flow increase is less, microlarge, larger, and too large.

Eight membership functions in this paper sufficiently reflect quasi-periodic variation of high-speed railway passenger flow, and the forecast result of FTLPFFM has better accuracy based on eight membership functions. Define the fuzzy membership function of subset , namely,

Different passenger flow change rates can be fuzzified into corresponding fuzzy sets. For example, as seen in Table 1, the passenger flow change rate from 7:00–8:00 to 8:00–9:00 is 0.273, which is fuzzified to . The passenger flow change rate from 8:00–9:00 to 9:00–10:00 is 0.231, which is fuzzified to . The passenger flow change rate from 9:00–10:00 to 10:00–11:00 is 0.5158, which is fuzzified to . And the passenger flow change rate from 10:00–11:00 to 11:00–12:00 is −0.8145, which is fuzzified to . The fuzzification process is depicted in Figure 3. Some fuzzified passenger flow change rates are listed in Table 1.

Fuzzy logic relationships are established by putting two consecutive fuzzy sets, as follows: ” denotes that “the fuzzified passenger flow change rate is from period to and then the fuzzified passenger flow change rate is from period to ”.

As seen in Figure 4, the fuzzified passenger flow change rate from 7:00–8:00 to 8:00–9:00 is and from 8:00–9:00 to 9:00–10:00 is . Hence, we can establish an fuzzy logic relationship as . Likewise, from Table 1, we can establish the fuzzy logic relationships as , , , , and so forth. Some fuzzy logic relationships are listed in Table 2.

4. Fuzzy Temporal Logic Based Passenger Flow Forecast Model

Notation:: the number of the passenger flow change rate belonging to ,: the size of neighborhood,: the dimension of the current passenger flow change rate vector.

4.1. K-Nearest Neighbor Model

The -nearest neighbor model is one of the most famous pattern recognition statistical models. The KNN model defines neighborhoods as those cases with the least distance to the input state [19]. The literature indicates that Euclidean distance is usually used to determine the distance between the input state and cases in the database [20]. The predictions can be calculated by averaging the observed output values for cases that fall within the neighborhood when the neighborhood is obtained.

For example, a passenger flow series where is the total number of points of the series. We search the series to find the nearest neighbors, of the current state . Then, we predict on the basis of those nearest values; for example, if the neighborhood size was and the nearest passenger flow was , then we would predict on the basis of . The value of in KNN model is more often obtained by empirical analysis. In general, the steps of the KNN model can be listed as follows.

Step 1. Identify the neighborhood size and the original state of variables.

Step 2. Input all original state of variables into the development database.

Step 3. Calculate Euclidean distance of the current state of variables to each state in development database.

Step 4. Choose output of -nearest neighborhood on the basis of shortest Euclidean distance from development database.

Step 5. Calculate the predictive value which is the average of the output of -nearest neighborhood.

4.2. Fuzzy Temporal Logic Based Passenger Flow Forecast Model

Suppose is the -period historical passenger flow state vector and is the historical passenger flow change rate vector. For , and are the current passenger flow state vector and the current passenger flow change rate vector.

4.2.1. Distance Metric

Give the state matrix of passenger flow and the matrix of the passenger flow change rate so as to compare the relationship among the different periods of passenger flow more clearly. The state matrix of passenger flow is given by

The matrix of the passenger flow change rate is given by

A common approach to measure the “nearness” in KNN model is to use the Euclidean distance [18]. Therefore, the Euclidean distances of the passenger flow state vectors and the passenger flow change rate vectors are as follows:

4.2.2. Forecast Passenger Flow Change Rate

Suppose the neighborhood search procedure identifies neighbors, the passenger flow state vectors of the neighbors are () and is next to , and is the current passenger flow state vector. The passenger flow change rates corresponding to and are , . The number of the passenger flow change rate belonging to is , and the value of corresponding to is .

An approach to forecasting is to compute an average of s of the neighbors that have fallen within the neighborhood:

4.2.3. Steps of FTLPFFM

The establishment of FTLPFFM is based on fuzzy -nearest neighbor prediction method.

Steps of FTLPFFM are as follows.

Step 1. Start with a minimal neighborhood size, .

Step 2. Start with a minimal dimension of the current passenger flow change rate vector, .

Step 3. Start with period to predict passenger flow.

Step 4 (match to find the elementary neighbors). Find the nearest matches for the current passenger flow state vector by searching the passenger flow series using (5), and then sort them in ascending order. Suppose an index , for which the nearest matching passenger flow state vector is and the historical passenger flow change rate vector associated is . Here, the current passenger flow change rate vector is ; search the same fuzzy logical relationships for and for , and choose the top matches which are the elementary neighbors. The appropriate passenger flow change rate vectors of will be discussed below.

Step 5 (match to find the nearest neighbors). Find the nearest matches for by searching all the historical passenger flow change rate vectors using (6), and then sort them in ascending order and choose the top matches. They are the nearest neighbor passenger flow state vectors , and output and , .

Step 6. Estimate the passenger flow change rate using (7).

Step 7. Calculate predictive value of passenger flow and add it to the database; repeat Step 4 to Step 7 with regard to until , is the last period.

Step 8. Calculate RMSE between the actual values and predicted values, which is given by where is the predicted value of actual value .

Step 9. Repeat Steps 3 to 8 for vector dimensions of .

Step 10. Repeat Steps 2 to 9 for neighborhood sizes of .

Step 11. Choose the optimal predictive values of passenger flow which yields minimal RMSE by optimizing the vector dimensions and the neighborhood.

Choose the maximum dimension of the current passenger flow change rate vector and the maximum neighborhood size according to the characteristics of the passenger flow. Smith and Demetsky (1994) [20] found that the best predictions were generated using , and Karlsson and Yakowitz (1987) [21] proposed that the best forecast values were generated using . Wang et al. (2011) [22] and Oswald et al. (2001) [23] revealed that the best results were obtained when . We obtain the best predicted values of passenger flow as nearly all fall within the search space, which is and , by numerous experiments using different dataset.

5. Case Study

The data were obtained from National Key Technology Research and Development Program, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University. The database was per hour passenger flow between 7:00 and 21:00 from Beijing to Jinan in Beijing-Shanghai high-speed railway, which was split into two parts separately: an estimation data set and a test data set. The estimation data set was collected from 1 July to 31 December 2011 (2576 observations) and the test data set was collected from 1 to 22 January 2012 (300 observations).

According to the passenger flow characteristics, we can set and . The developed model for the passenger flow of the high-speed railway was implemented using MATLAB version 7.1. The best results were obtained when and , which can be seen from RMSE performance, and RMSE = 2.7046. The best prediction results and actual values are shown in Figure 5.

ARIMA model is a benchmarking method in forecasting field, but it is a gray box model, which cannot reflect the underlying structural properties. KNN model has dynamic adaptability to the data which is a white box model and has sufficient comprehensibility. And FTLPFFM is presented based on KNN forecasting model and has sufficient comprehensibility and interpretability. Therefore, FTLPFFM is compared with ARIMA and KNN models using three statistics: MAE, MAPE, and RMSE, as is shown in Table 3. And (9) shows how MAE and MAPE are computed, respectively. Consider

The absolute error and the absolute relative deviation of three models are computed as shown in Figures 6 and 7.

The result of the comparison between the prediction results and actual values indicates that the proposed model has been shown to be effective and the error is acceptable.

6. Conclusion and Future Work

Railway transport is an increasingly popular transportation mode for medium-long distance journey in many countries in recent years. The short-term passenger flow forecast has played a key role in high-speed railway intelligent transportation system. In this paper, a FTLPFFM is developed to measure uncertainty of high-speed railway passenger flow for railway passenger transport management. In FTLPFFM, the past sequences of passenger flow are considered to predict the future passenger flow using fuzzy logic relationship recognition techniques in the searching process. The results reveal that the forecast accuracy (measured with MAE, MAPE, and RMSE) of the FTLPFFM was significantly better than the accuracy levels of the ARIMA and KNN models. Fuzzy temporal logic based passenger flow forecast model also provides a theoretical foundation in decision-making of resource allocation. In a more general sense of application, the proposed method could be adapted in multimodal transportation systems especially in railway transport and metro transport. For future work, one possible extension of this research is to improve forecast accuracy via properly applying data fusion and pattern recognition techniques.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

Project is supported by the National Natural Science Foundation of China (no. 61074151), the National Key Technology Research and Development Program of China (no. 2009BAG12A10), the National High Technology Research and Development Program 863 of China (no. 2012AA112001), and the Research Fund of Beijing Jiaotong University (no. T14JB00380), China.