Research Article  Open Access
Track Irregularity Time Series Analysis and Trend Forecasting
Abstract
The combination of linear and nonlinear methods is widely used in the prediction of time series data. This paper analyzes track irregularity time series data by using gray incidence degree models and methods of data transformation, trying to find the connotative relationship between the time series data. In this paper, GM is based on firstorder, single variable linear differential equations; after an adaptive improvement and error correction, it is used to predict the longterm changing trend of track irregularity at a fixed measuring point; the stochastic linear AR, Kalman filtering model, and artificial neural network model are applied to predict the shortterm changing trend of track irregularity at unit section. Both longterm and shortterm changes prove that the model is effective and can achieve the expected accuracy.
1. Introduction
Track irregularity is a serious threat to the safety of train operation. Track irregularity data includes environmental variables (gauge, longitudinal level, cross level, alignment, and twist) and effective variables (vertical acceleration and horizontal acceleration). The developing and changing process of the track irregularity state is random, which cannot be defined by identified function. Generally, it can be researched with the combination of probability theory and analysis method within a certain range. Nowadays, most studies focus on the overall indicators which evaluate the changes of the track’s state, but a few studies focus on the changes of specific geometric parameters’ changes and the laws behind them. This is a basic difficulty.
Linear and nonlinear methods are two groups of models employed to estimate time series. DENG Julong [1] proposed the gray system theory in 1982. Gray system theory has been widely applied to the field of controlling, forecasting, and decision making, and the GM model is its core essence. G. Liu and Yu [2] studied the main factors that could affect MLF generation by using the method of gray correlation coefficient. Marcellino et al. [3] and Ding et al. [4] studied the autoregressive model (AR) to forecast macroeconomic time series and parameter estimation problems. AR is a main model of random process, which can only reflect the target through historical values of the time series, without being constrained by the mutually independent variables, eliminating the difficulties caused by independent variables selection in ordinary regression prediction and multicollinearity, and so forth. Kalman [5] proposed Kalman filter model in 1960. In the study of Feil et al. [6] and Kandepu et al. [7], Kalman model was applied to monitoring process transitions and nonlinear state estimation. Rumelhart and McClelland [8] studied the neural network years ago. Balestrassi et al. [9] studied neural network’s training for nonlinear time series forecasting. Khashei et al. [10] studied artificial neural networks in hybrid models. The hybrid method is widely used on predicting time series predictions now. Zhang [11] proposed to take advantage of the unique strengths of ARIMA and ANN models in linear and nonlinear modeling. H. Liu et al. [12] studied hybrid methods in the prediction of wind speed based on time series, artificial neural networks (ANNs), and Kalman filter (KF). Areekul et al. [13] studied hybrid methodology which combined both autoregressive integrated moving average (ARIMA) and artificial neural network (ANN) models to predict shortterm electricity prices. Khashei et al. [14] and Khashei and Bijari [15] proposed hybrid method that could yield more accurate results with incomplete data sets based on the basic concepts of ANNs and fuzzy; he also proposed hybrid model of artificial neural networks by using autoregressive integrated moving average (ARIMA) models in order to yield a more accurate forecasting model compared to artificial neural networks. Aladag et al. [16] proposed a hybrid approach combined with Elman’s Recurrent Neural Networks (ERNNs) and ARIMA models and applied the approach to Canadian Lynx data. In practical prediction, research methods are often composed of two types of models.
In this paper, three aspects are studied on trends of track cross level state changes. First, it analyzes track irregularity time series data and tries to find the connotative relationships between time series data with the application of seven gray incidence degree theories; secondly, it predicts longterm track level changes at fixed measuring point; finally, it predicts changes of tracks over time at unit section in short term. This paper modifies and corrects the inadequacies in the GM model, which can only reflect the state of development of the general trend other than reflect cycle and random variation of the changes of track level at the fixed measuring point. The accuracy of fitting and forecasting can be greatly improved. In terms of unit section track state study, this paper uses random linear AR model and Kalman filtering model to analyze track state over time as well as to predict its future state. By combining the above studies, we can see the statistical laws of track state changes in the long and short term and can forecast the future state of the track.
2. Data Analysis
2.1. Analysis of Track Irregularity Data
The idea of time series analysis has been applied in many areas of research, such as the relationship of following speed and spacing with driving time in driver’s safetyrelated approaching behavior [17, 18]. In track irregularity time series studies, the continuity of tracks leads to a great similarity between two random time series data obtained at two adjacent inspection points. The comparison of track cross level values at K550.00166 and K550.00191 mileage points is shown in Figure 1.
It can be seen through Figure 1 that data obtained between the two adjacent measuring points shares high similarity. There is a great inconsistency during the 23th, 24th, 25th, and 26th inspection at the two adjacent measuring points. It shows great changes on track state during this time period.
In terms of the complicity of the relationships of time series curves, it is not easy to find a standard or a fixed formula to indicate the time series curve, but it can only give a complex evaluation on the changes and a developing tendency of the time series data. As a result, this paper analyzes and compares seven incidence degree algorithms. Certain relationships exist between track irregularity time series. Seven incidence degree [19] formulas include displacement incidence degree (DID), absolute incidence degree (AID), improved absolute incidence degree (IAID), T incidence degree (TID), slide incidence degree (SID), first difference incidence degree (FODID), and second difference incidence degree (SODIG). These seven incidence degree are used to reflect the corelationship between time series curves. Table 1 shows, respectively, seven incidence degrees between actual cross level time series and reference cross level time series.
 
The relationship of cross level data between adjacent hours is shown in Figure 2. 
It can be found from Figure 2 that changes on two adjacent cross level irregularity state data show the linear trend, with an approximated slope of 1. If there is a large deviation from the slope, it will illustrate that the two adjacent inspection data on track cross level state have been changed greatly and are in need of special attention.
2.2. Analysis of Track Irregularity Time Series
Track inspection data refers to the data obtained within a roughly fixed time interval (a half month), which is generated from geometry state detection along the mileage range of railway line. The time sequence of track geometry state changes with the following characteristics.
(1) Data Elements of Original Time Series Is a Data Set
In the study of variation law of detection data, each detection data on a certain unit of section area is considered as a data unit. Data sequence consisted of data unit within a certain time frame is the object of study, forming a time series. Original time series data is described as follows:
In the formula, is the prediction data set at the unit section, constituting a time series of data units, , is the time point in time series, is mileage, is gauge, is longitudinal level, is cross level (L), is cross level (R), is alignment (L), is alignment (R), and is twist.
(2) Data Transformation Is Necessary
Since each data unit is not a single data, but a data set of union section, rather than, therefore, it is necessary to transform processing in order to form data which can reflect the real characters of this section geometry state at :
In the formula, reflects the characteristics of the entire section of the track geometry at . After transformation, changes of time series data at unit section are shown in Figure 3.
(3) Time Series Data Are Small Data Sets
In order to keep track status in good condition and to ensure operation safety, maintenance at regular intervals is needed as the track state changes. Only data from two maintenance operations can be seen as the objects of the study as well as time series data. It also means that this is a small data set within a short timespan. We need to find an effective forecasting method to realize our research goal even though historical data is limited.
As shown in Figure 4, refers to track geometry state changes (deterioration) limits and and refer to the exact time that maintenance and repair operations occurred within. It shows a cyclical changing trend of track state conditions.
(4) Data Selecting
In this paper, track irregularity data by track inspection car in the experimentation is provided by State key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University. The cross level irregularity data is selected as the object of this research. The research selects the BeijingKowloon upline, the K550 + 000 to K550 + 075 mileage ranging from the second track inspection in late February 2008 to the second track inspection in late May 2009, a total of 31 inspection data as data object, each of which contains 300 cross level values and each data array contains 300 elements.
3. Medium and LongTerm Track State Change Models
3.1. Improvement of GM Model and Prediction
GM stands for a grey model of “” variable expressed by order differential equations. Generally speaking, when we make benefit analysis and production forecast in the field of economy and agroecology, we only work on a variable—a result, by then . But when is too large, it will be too difficult to calculate; thus model GM will be commonly used. G represents gray, M represents model, and GM stands for firstorder, single variable grey model [1], which can be represented by the linear differential equation (3.1). GM model is usually used to predict growth trend sequences with power exponent and usually has better accuracy in prediction. However, in reality, in most cases, the series data does not show the exponentially growth trend, and generally they are outliers, which limit the range of applications and fields of GM . Thus, the model needs to be improved on the pretreatment of the raw sequence, so that it can expand the extent to which the model can be applied.
Track cross level irregularity data is a data set at a fixed measuring point, which fluctuates along mileage with zero values, and the data itself is not monotonic. The methods of the improved model are as follows: first, the fluctuating value of data is changed to zero by translating, and then a fixed positive constant is added to each data, so that the new time series data are positive. Next, smooth the new time series data using a power function . The result of new time series data weakens the impact of outliers on the fitted data. In this paper, the positive regular value is selected as the integer value of two times of the maximum absolute value among all original series of data elements, that is, .
According to the analysis of the track cross level sequence of raw data, we find that . So we select 15 as the positive constant value. According to the degree of the dispersion of the newly constructed data, , which ranges from 0 to 1, can be determined. Combined with data characteristics of cross level irregularity, we set . Reconstruct the original series by applying new methods of constructing series and then get the new series , adding up, AGO sequence is constructed:
When is on a point value , approximation taken in the point , that is,
After transformation, let us solve differential equations
Then we can obtain the coefficients and of the regression curve according to the least squares method. The expression resulted from the solutions of
Next, take the values of and into (3.4), we can obtain GM prediction model of the track cross level state changes. Because the calculation data is a data array which is added up based on a fixed value; the final predictive value of the expression is as follows:
With the application of the formula (3.5), we can predict the law of trend on historic track cross level data. The trend curve and actual curve fitting is shown in Figure 5.
When gray model GM is used to forecast time series data that after transformation (see Section 2.2) at the unit section, the predictive result is shown in Figure 6.
It can be found from Figures 5 and 6 that the GM predictive value curve is smooth and there is a larger deviation between the predicted values and actual values; so it can only reflect the overall trend, but cannot reflect the characteristics of the cyclical changes and random fluctuations and cannot be applied to forecast track state. Therefore, the GM model needs amended residuals to meet the forecasting requirements.
3.2. Gray Model GM with Residual Modification
Since the residuals are large, there will be a great inaccuracy in GM when predicting the actual track state change trends. So we cannot predict the medium and longterm track state changes. In this paper, a method based on the trigonometric residual modification was presented to improve the predictive accuracy.
Time series of the track geometry state changes has cyclical characteristics according to the analysis of the historical changing trend of cross level. We find that trigonometric function has obvious cyclical features. In this paper, trigonometric function is used to correct residuals of the prediction model. Here, the residual refers to the actual value minus the predicted value, that is, . Set
In the formula, is the amplitude of wave mode, , is the cycle, is the inspection time interval sequence. Because of , so . One has
With the principle of the minimum cumulative error of the fitted values and actual values, combined with the application of trigonometric wave mode matching method, we try to make sure that the posteriori error is the smallest and the small probability is the largest, and then we obtain . At the same time, the amplitude of wave mode calculated by the formula (3.7) is .
Take and into (3.6); then we obtain the revised residuals’ formula:
Combined with residual formula and the formula (3.5), the final forecast expression after residuals adjustment is
Let us predict track cross level state with the formula (3.9). The predicted values and actual values are shown in Figure 7.
When gray model GM after residual modification is used to forecast time series data (see Section 2.2) at the unit section, the prediction formula is (3.10), and the prediction result is shown in Figure 8:
As can be seen from Figures 7 and 8, compared to the original forecasting trend curve, the modified forecasting trend curve is much closer to the actual value. It has a better degree of fitting and can reflect the cyclical changes of the track cross level state. Therefore, the revised model can be applied to forecast the future track cross level state trends in the medium and in the long term.
In gray forecasting, the prediction with good fitting and extrapolation leads to a smaller value and a larger value . It shows a large probability of small error and high accuracy in prediction [20]. According to the statistical theory, we examine the accuracy of prediction on track state by using posteriori error and the small probability and then make a comparison between it and the predictive accuracy of original GM model. See Table 2.

Through comparative analysis, the variance ratio of posteriori error of GM model after the residual modification is significantly smaller than the original residual model; thus the fitting and extrapolation of the modified model have changed for the better, and the predictive accuracy is improved.
4. ShortTerm Prediction Models of Cross Level State Change
4.1. Prediction Based on AR Model
Track cross level irregularity time series data is smooth and consistent with the characteristics of the stationary random sequence; so there is no need to eliminate the trend of the differential operator. Although there is no definite model in track state changes in the long run, the state change in a short period can still be considered as close to the linear model. In order to study the unit section of the overall level of state which changes over time, it is considered as onedimensional array data which contains 300 data at a select unit section. The track cross level irregularity time series data is
Then, , time series is generated with mean value zero, , and is the sample mean value of .
Autocovariance function refers to the random signal between the values of two different moments of the secondorder mixed central moments. Autocorrelation function depicts the incidence degree between adjacent variables of time sequence. The partial autocorrelation function was excluded from the impact of other intermediate variables; the two functions are closely related and can reflect the true incidence degree between two variables [21]. The value of sample’s autocovariance function , incidence degree function , and partial autocorrelation function are shown in Table 3.

It can be seen from Table 3 that when becomes greater, the previous four , absolute value of , are getting smaller and smaller. Therefore, we can see that the autocorrelation function is tailed. When , there is at most one that can make . Therefore, the sample partial incidence degree function is truncated; partial autocorrelation function is truncated at the point at which .
Through comprehensive analysis, the prediction model is defined as AR . We can calculate the parameter’s estimates using YuleWalker equations.
We get , , , and .
Thus, the AR model is
In the formula, , is the random disturbance error, which is white noise sequences with zero mean value, normally variance, nonzero, unrelated, and independent.
Taking estimated value on both sides of formula (3.8) and then take the estimated parameter into the formula, we can get the AR prediction formula:
In formula, when , .
The predictive results of cross level irregularity data at late June 2009 and actual test data are shown in Figure 9.
By contrasting the forecasted data with the actual inspection data, it can be found that the distribution characteristics of actual value and the predictive value can agree with each other well, and the data curves roughly coincide with each other.
4.2. Prediction Model Based on Kalman Filtering
Kalman filtering can be used to estimate the current state when the estimated state from the last time and the current state are known, needless to know historical information observations or estimates. In the absence of maintenance, changes of track geometry are closely related to the passing gross weight change; deviation of track geometry will be further from the standard value with the increase in gross weight; track geometry status is also affected by the impact of train speed. The higher the speed is, the greater force is exerted on the track and the greater influences on the track geometry status are. The track geometry (detection data), passing gross weight change, and train speed are used as the technical indicators for track state prediction, and the accumulation and analysis of historical data can be used for building track state prediction models.
With the application of Kalman filtering algorithm, in the track inspection data analysis and forecasting models, is actual value of track inspect items; is the transfer matrix of the actual value; is track inspection car’s detecting value; is the observation matrix; is process noise , which is the deviation of the track state changes; is measuring noise , which is white Gaussian noise; is the predictive value of the state of the track geometry. The prediction formula of Kalman filtering is as follows:
In the formula, is prediction value, is the minimum meansquare deviation under the revised matrix, is transfer matrix, is an observed value and is an estimated value, and is the measured matrix.
Kalman filter model is applied to forecast the cross level status the next time when testing. The comparison of detection cross level value and the prediction value is shown in Figure 10.
4.3. Prediction Model Based on Artificial Neural Network
Artificial neural network (ANN) is widely used in function approximation, pattern recognition, and data compression [22–24]. It is the best method compared with other traditional models, because it has better durability, timely forecasts, highly nonlinear, and strong selfadaptive learning ability. Usually, the network has an input layer, an output layer, and a hidden layer. ANN has advantages such as the following. Network’s input and output can be achieved in any nonlinear mapping as long as there are enough hidden layers and hidden nodes. The relationship between input nodes and output nodes of ANN is as follows [10]:
In the formula, is a hidden layer transfer function, is actual output, is connection weights, is the number of input nodes, and is the number of hidden nodes. The neural network model performs a nonlinear functional mapping from the past observations to the future value , that is,
is a vector of all parameters and is a function determined by the network structure and connection weights. Because track state changes are nonlinear and ANN has a flexible capability in nonlinear modeling, ANN is applied to forecast the track data change. The forecasting result is shown in Figure 11.
4.4. Comparison of Three Prediction Models
The specific error distribution of AR model, Kalman filtering model, is ANN model are shown in Table 4.

It can be seen from Table 4 that the predictive accuracy of AR and Kalman filtering models is similar, and the predictive accuracy of ANN model is slightly higher than the previous two.
5. Conclusions
After the comprehensive assessment of the incidence degrees of track irregularity between various indicators of factors, we find that when the associated values are higher, these correlated time sequences will normally have a higher degree of factors correlation or processes correlation. Meanwhile, the calculated results of incidence degree will be in a good agreement with the actual situation, which will provide a reliable basis for choosing modeling variables and analyzing factors. Improved GM model based on features of track cross level data can predict track state development and changes at fixed measuring point in the medium and long term. Fitting curve can reflect the cyclical changes of cross level state over time by residual modification. Statistical validation shows that the posteriori error values in improved model which was corrected with residuals will be reduced down from 65% to 43%, compared to the original model. It reflects the changes of cross level state more accurately. Random linear AR model, Kalman filtering, and ANN are used to predict the state changes of union section in short term. The results show that the accuracy of ANN is slightly higher than AR model and Kalman filtering, and the combination of the four models together constitutes the research of longterm and shortterm track state changes at fixed measuring point and union section.
Acknowledgments
This research was supported by the National Natural Science Foundation of China (General Projects) (Grant no.: 61272029), National Key Technology R&D Program (Grant no.: 2009BAG12A10), China Railway Ministry Major Program (2008G017A), and State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University (Contract no.: RCS2009ZT007).
References
 D. Julong, The Primary Methods of Grey System Theory, Huazhong University of Science and Technology Press, Wuhan, China, 2005.
 G. Liu and J. Yu, “Gray correlation analysis and prediction models of living refuse generation in Shanghai city,” Waste Management, vol. 27, no. 3, pp. 345–351, 2007. View at: Publisher Site  Google Scholar
 M. Marcellino, J. H. Stock, and M. W. Watson, “A comparison of direct and iterated multistep AR methods for forecasting macroeconomic time series,” Journal of Econometrics, vol. 135, no. 12, pp. 499–526, 2006. View at: Publisher Site  Google Scholar
 J. Ding, L. Han, and X. Chen, “Time series AR modeling with missing observations based on the polynomial transformation,” Mathematical and Computer Modelling, vol. 51, no. 56, pp. 527–536, 2010. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 R. Kalman, “A new approach to linear filtering and prediction problems,” Transaction of the ASMEJournal of Basic Engineering Series D, vol. 82, pp. 35–45, 1960. View at: Google Scholar
 B. Feil, J. Abonyi, S. Nemeth, and P. Arva, “Monitoring process transitions by Kalman filtering and timeseries segmentation,” Computers and Chemical Engineering, vol. 29, no. 6, pp. 1423–1431, 2005. View at: Publisher Site  Google Scholar
 R. Kandepu, B. Foss, and L. Imsland, “Applying the unscented Kalman filter for nonlinear state estimation,” Journal of Process Control, vol. 18, no. 78, pp. 753–768, 2008. View at: Publisher Site  Google Scholar
 D. E. Rumelhart and J. L. McClelland, Parallel Distributed Processing, Explorations in the Microstructure of Cognition, MIT Press, 1986.
 P. P. Balestrassi, E. Popova, A. P. Paiva, and J. W. Marangon Lima, “Design of experiments on neural network's training for nonlinear time series forecasting,” Neurocomputing, vol. 72, no. 4–6, pp. 1160–1178, 2009. View at: Publisher Site  Google Scholar
 M. Khashei, M. Bijari, and G. A. Raissi Ardali, “Improvement of autoregressive integrated moving average models using fuzzy logic and artificial neural networks (ANNs),” Neurocomputing, vol. 72, no. 4–6, pp. 956–967, 2009. View at: Publisher Site  Google Scholar
 P. P. Zhang, “Time series forecasting using a hybrid ARIMA and neural network model,” Neurocomputing, vol. 50, pp. 159–175, 2003. View at: Publisher Site  Google Scholar
 H. Liu, H.Q. Tian, and Y.F. Li, “Comparison of two new ARIMAANN and ARIMAKalman hybrid methods for wind speed prediction,” Applied Energy, vol. 98, pp. 415–424, 2012. View at: Publisher Site  Google Scholar
 P. Areekul, T. Senjyu, H. Toyama, and A. Yona, “A hybrid ARIMA and neural network model for shortterm price forecasting in deregulated market,” IEEE Transactions on Power Systems, vol. 25, no. 1, pp. 524–530, 2010. View at: Publisher Site  Google Scholar
 M. Khashei, S. R. Hejazi, and M. Bijari, “A new hybrid artificial neural networks and fuzzy regression model for time series forecasting,” Fuzzy Sets and Systems, vol. 159, no. 7, pp. 769–786, 2008. View at: Publisher Site  Google Scholar
 M. Khashei and M. Bijari, “An artificial neural network (p, d, q) model for timeseries forecasting,” Expert Systems with Applications, vol. 37, no. 1, pp. 479–489, 2010. View at: Publisher Site  Google Scholar
 C. H. Aladag, E. Egrioglu, and C. Kadilar, “Forecasting nonlinear time series with a hybrid methodology,” Applied Mathematics Letters, vol. 22, no. 9, pp. 1467–1470, 2009. View at: Publisher Site  Google Scholar
 W. Wang, W. Zhang, H. Guo, H. Bubb, and K. Ikeuchi, “A safetybased approaching behavioural model with various driving characteristics,” Transportation Research Part C, vol. 19, no. 6, pp. 1202–1214, 2011. View at: Publisher Site  Google Scholar
 W. Wang, Vehicle’s ManMachine Interaction Safety and Driver Assistance, China Communications Press, Beijing, China, 2012.
 X. Xinping, “Theoretical study and reviews on the computation method of grey interconnet degree,” Systems EngineeringTheory & Practice, vol. 17, no. 8, pp. 77–82, 1997. View at: Google Scholar
 J. H. Stock and M. W. Watson, “Implications of dynamic factor models for VAR analysis,” NBER Working Paper no. 11467, 2005. View at: Google Scholar
 N. L. Yu, D. Y. Yi, and X. Q. Tu, “Analyzing autocorrelation and partialcorrelation functions in time series,” Mathematical Theory and Applications, vol. 27, no. 1, pp. 54–57, 2007. View at: Google Scholar
 M. Khayet, C. Cojocaru, and M. Essalhi, “Artificial neural network modeling and response surface methodology of desalination by reverse osmosis,” Journal of Membrane Science, vol. 368, no. 12, pp. 202–214, 2011. View at: Publisher Site  Google Scholar
 A. Norets, “Estimation of dynamic discrete choice models using artificial neural network approximations,” Econometric Reviews, vol. 31, no. 1, pp. 84–106, 2012. View at: Publisher Site  Google Scholar
 C. B. Cai, H. W. Yang, B. Wang, Y. Y. Tao, M. Q. Wen, and L. Xu, “Using nearinfrared process analysis to study gassolid adsorption process as well as its data treatment based on artificial neural network and partial least squares,” Vibrational Spectroscopy, vol. 56, no. 2, pp. 202–209, 2011. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2012 Jia Chaolong et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.