Mathematical and Intelligent Techniques for Data Analytics in Science and EngineeringView this Special Issue
ARIMA-FSVR Hybrid Method for High-Speed Railway Passenger Traffic Forecasting
In order to improve the prediction accuracy of railway passenger traffic, an ARIMA model and FSVR are combined to propose a hybrid prediction method. The ARIMA prediction model is established based on the known railway passenger traffic data, and then, the ARIMA prediction results are used as the training set of the FSVR method. At the same time, the air price and historical passenger traffic data are introduced to predict the future passenger traffic, to realize the mixed prediction of railway passenger traffic. The case study demonstrates that the hybrid prediction method can effectively improve the prediction performance of railway passenger traffic. Compared with the single ARIMA method, the hybrid prediction method improves the delay of the prediction results. Compared with the FSVR prediction result, the hybrid prediction method greatly reduces the errors in the extreme points of passenger traffic and long-term prediction. The relevant research results of this paper provide a useful reference for the prediction of railway passenger traffic.
At present, commonly used passenger flow prediction methods are based on historical data including time-series methods, support vector machines, and neural networks [1–3]. For instance, Ni et al.  applied the autoregressive moving average (ARIMA) method to solve traffic flow prediction and proved that it can solve the problem of modeling about nonstationary time-series prediction. Xie et al.  designed the fuzzy time-series ARIMA method for long-term waterway traffic volume prediction. Li et al.  proposed a robust v-support vector regression (RSVR) method to forecast vessel traffic flow. Liu et al.  adopted a support vector machine- (SVM-) based regression prediction to predict the bus passenger flow in the target time window. Li et al.  put forward a backpropagation neural network (BPNN) model with population per distance band for traffic flow prediction of urban rail transit station. Hu et al.  developed a model re-sample recurrent neural network (RRNN) to forecast passenger traffic on mass rapid transit systems.
Due to the different advantages and disadvantages of various prediction methods, the prediction effect of a single mechanism prediction method is often not ideal. If two or more methods are organically combined to form a hybrid prediction method, it will overcome the deficiencies of a single prediction mechanism and improve the performance of passenger flow prediction [10, 11]. Khan et al.  combined wavelet transform (WT) with artificial neural network (ANN) and ARIMA into a hybrid model for meteorological drought forecasting, and the model inherits the merits of both WT and ANN-ARIMA. Wu et al.  created a hybrid model of ARIMA and wavelet neural network (WNN) combined with genetic algorithm to predict the river water quality. Yu et al.  built a novel SVR-ANN combined model with EEMD for rainfall prediction. Luo et al.  explored a combined prediction model based on the empirical mode decomposition, support vector regression, and wavelet neural network (EMD-SVR-WNN) to forecast the structural settlement and deformation. The above models achieved satisfactory results. It can be found that SVR and neural network are suitable for solving complex nonlinear problems, and the time-series model has great advantages for time-based prediction. However, there are still some inherent defects in the neural network model, such as ease of sinking into local optimization and the overfitting. Therefore, the SVR and time-series method are selected for hybrid prediction.
In this thesis, a combination of differential integrated moving average autoregressive model (ARIMA) and fuzzy support vector regression machine (FSVR) is used to implement a mixed forecasting strategy for railway passenger flow. And, apply it to the actual passenger flow forecast of Shanghai-Guangzhou high-speed railway in order to obtain good forecast performance. Support vector regression (SVR) is a general learning method based on the statistical learning theory of limited samples (SLT) . Fuzzy support vector regression (FSVR) is a new type of support vector regression machine that combines fuzzy mathematics and support vector regression. It introduces fuzzy membership and improves the generalization of machine learning ability. According to the theory of time-series analysis, the ARIMA model is suitable for the prediction and analysis of stationary time series, and the passenger flow data is generally nonstationary series, which needs to be smoothed by difference. Therefore, the differential autoregressive moving average model (ARIMA) is used to predict passenger flow.
Differential autoregressive moving average model (ARIMA) is an important method for studying time series. In ARIMA (p, d, q), AR is autoregressive and p is the number of autoregressive items, MA is the moving average, q is the moving average item number, and d is the number of differences made to make it a stationary sequence. The ARIMA (p, d, q) model is an extension of the ARMA (p, q) model.
The basic form of the ARMA model iswhere c is the constant, , is the coefficient, is the white noise sequence, p is the autoregressive order, and q is the moving average order.
After passing the difference, the basic form of the ARIMA model iswhere is the lag operator and is the difference order, .
3. Fuzzy Support Vector Regression
The principle of FSVR is to find a function by minimizing the prediction error, use the nonlinear mapping function to map the data in the input space to the high-dimensional space H, and perform linear regression calculation in H to achieve the effect of nonlinear regression in the original low-dimensional space .
In practical applications, different data points contribute differently to the training results, so FSVR solves the problem of overlearning due to the presence of noisy data by introducing fuzzy parameters to eliminate the influence of noise , that is, there is a fuzzy degree and each data point is connected so that a training set with fuzzy members will be generated.
For FSVR, let the training set be , where , , and . In the time-series problem, the membership degree is a function of the time series . In this thesis, the fuzzy membership function is the quadratic function of the time series , namely, :
The boundary conditions are
FSVR is for solving quadratic programming problems:where is the regression hyperplane weight vector, is the deviation coefficient, is the penalty parameter (as a constant value), is the regression hyperplane bandwidth, are the relaxation variable, and is the fuzzy membership.
The dual form of equation (5):
Solving the dual problem (6), we can get the FSVR regression function:
Using the high-speed rail passenger flow between Shanghai and Guangzhou as experimental data, the passenger flow is obtained by day, a total of 176 days of sample data are collected, the first 165 days of sample data are used to build the model, and the last 8 days of sample data are used as test samples to predict comparative analysis.
In order to reduce the computational complexity and accuracy of parameter selection, the raw data is normalized. Table 1 shows part of the passenger flow data.
Using the ARIMA model to predict the values, the results are as follows.
It can be seen from the prediction results shown in Figure 1, and the ARIMA model can realize the prediction and analysis of railway passenger traffic. The fluctuation of its prediction results is consistent with the actual passenger traffic curve, but there is a large delay phenomenon which causes a large prediction error and the prediction effect is not ideal.
Based on FSVR’s passenger flow prediction, the results are as follows.
It can be seen from the prediction results shown in Figure 2, and the FSVR has a strong nonlinear approximation ability; it has shown good prediction performance in the process of railway passenger traffic forecast, especially in the short-term passenger traffic forecast; its prediction error is small, and the passenger traffic continues to increase or continue. The prediction error is small during the decrease, but at the extreme point, where the passenger traffic trend changes, that is, the passenger traffic changes from increasing to decreasing, or from decreasing to increasing, the prediction error is large. In other words, the dramatic fluctuations in passenger traffic reduce the generalization ability of FSVR and affect its prediction performance.
Using the above ARIMA forecast results as the input items of FSVR, the mixed forecast of railway passenger traffic is realized. The results are as follows.
It can be seen from the prediction results shown in Figures 3 and 4 that the hybrid prediction method can combine the advantages of the two prediction methods to obtain the best prediction results. Compared with the ARIMA method, the delay of the hybrid method prediction results is greatly improved; compared with the FSVR, the prediction effect at the extreme point is significantly improved, and the prediction error is greatly reduced.
In order to prove the performance of the proposed algorithm, it is compared with the ARIMA-WNN method and the EMD-SVR-WNN method. The results of the three hybrid prediction methods are shown in Figure 5.
It can be seen from the prediction results in Figure 5 that, though the ARIMA-WNN method is accurate in the early prediction, it gradually appears the phenomenon of delay after 4 days. The overall trend of the EMD-SVR-WNN method is consistent with the original data; however, the overall predicted value is small. Compared with the above two methods, the prediction results of the ARIMA-FSVR method are more accurate. The forecast error indexes of various methods are shown in Table 2.
It can be seen from Table 2 that the standard error of the ARIMA-FSVR prediction is smaller than the ARIMA and FSVR methods. It is also smaller than the other two hybrid methods. The correlation coefficient of the ARIMA-FSVR method is less than 0.0001, and the value is 0.9822. Compared with the other four methods, the correlation coefficient is larger and the value is lower, which proves that the trend of the ARIMA-FSVR method is more accurate and can accurately predict the railway passenger traffic.
It can be found from the experimental results that the ARIMA-FSVR method can accurately predict the railway passenger traffic, handle complex nonlinear relationships, and obtain satisfactory prediction results.
In this paper, a new hybrid method was successfully proposed which achieved great improvements regarding both the prediction accuracy and robustness of the single-item models:(1)The ARIMA-FSVR hybrid prediction method overcame the shortcomings exposed in the single-item forecasting method, and it can improve the ARIMA delay phenomenon.(2)The ARIMA-FSVR hybrid prediction method surmounts the extreme point problem of the FSVR method.(3)Empirical studies on the realistic passenger flow data indicated that the ARIMA-FSVR hybrid method was clearly superior to other benchmark hybrid models. This hybrid method obtained the lowest prediction error and had higher accuracy and more reliable prediction results.
In conclusion, the ARIMA-FSVR hybrid method can accurately predict the railway passenger traffic, overcoming the shortcomings of the single-item forecasting method and, at the same time, merging the advantages of single-item forecasting and improving the accuracy of the forecast. This method effectively solves the nonlinear problem of railway traffic data and provides a new and effective method for the nonlinear prediction problem in practical applications.
The case analysis data used to support this study are available from the railway passenger transport department upon request.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
The project was supported by Science and Technology Research Project of Beijing-Shanghai High Speed Railway Co., Ltd. (Grant no. Beijing-Shanghai Scientific Research-2020-2), Scientific Research Projects of China Academy of Railway Sciences Co., Ltd. (Grant no. 2019YJ120), and Science and Technology Research and Development Plan of China Railway (Grant no. K2019X022).