#### Abstract

The combination of linear and nonlinear methods is widely used in the prediction of time series data. This paper analyzes track irregularity time series data by using gray incidence degree models and methods of data transformation, trying to find the connotative relationship between the time series data. In this paper, GM is based on first-order, single variable linear differential equations; after an adaptive improvement and error correction, it is used to predict the long-term changing trend of track irregularity at a fixed measuring point; the stochastic linear AR, Kalman filtering model, and artificial neural network model are applied to predict the short-term changing trend of track irregularity at unit section. Both long-term and short-term changes prove that the model is effective and can achieve the expected accuracy.

#### 1. Introduction

Track irregularity is a serious threat to the safety of train operation. Track irregularity data includes environmental variables (gauge, longitudinal level, cross level, alignment, and twist) and effective variables (vertical acceleration and horizontal acceleration). The developing and changing process of the track irregularity state is random, which cannot be defined by identified function. Generally, it can be researched with the combination of probability theory and analysis method within a certain range. Nowadays, most studies focus on the overall indicators which evaluate the changes of the track’s state, but a few studies focus on the changes of specific geometric parameters’ changes and the laws behind them. This is a basic difficulty.

Linear and nonlinear methods are two groups of models employed to estimate time series. DENG Julong [1] proposed the gray system theory in 1982. Gray system theory has been widely applied to the field of controlling, forecasting, and decision making, and the GM model is its core essence. G. Liu and Yu [2] studied the main factors that could affect MLF generation by using the method of gray correlation coefficient. Marcellino et al. [3] and Ding et al. [4] studied the autoregressive model (AR) to forecast macroeconomic time series and parameter estimation problems. AR is a main model of random process, which can only reflect the target through historical values of the time series, without being constrained by the mutually independent variables, eliminating the difficulties caused by independent variables selection in ordinary regression prediction and multicollinearity, and so forth. Kalman [5] proposed Kalman filter model in 1960. In the study of Feil et al. [6] and Kandepu et al. [7], Kalman model was applied to monitoring process transitions and nonlinear state estimation. Rumelhart and McClelland [8] studied the neural network years ago. Balestrassi et al. [9] studied neural network’s training for nonlinear time series forecasting. Khashei et al. [10] studied artificial neural networks in hybrid models. The hybrid method is widely used on predicting time series predictions now. Zhang [11] proposed to take advantage of the unique strengths of ARIMA and ANN models in linear and nonlinear modeling. H. Liu et al. [12] studied hybrid methods in the prediction of wind speed based on time series, artificial neural networks (ANNs), and Kalman filter (KF). Areekul et al. [13] studied hybrid methodology which combined both autoregressive integrated moving average (ARIMA) and artificial neural network (ANN) models to predict short-term electricity prices. Khashei et al. [14] and Khashei and Bijari [15] proposed hybrid method that could yield more accurate results with incomplete data sets based on the basic concepts of ANNs and fuzzy; he also proposed hybrid model of artificial neural networks by using autoregressive integrated moving average (ARIMA) models in order to yield a more accurate forecasting model compared to artificial neural networks. Aladag et al. [16] proposed a hybrid approach combined with Elman’s Recurrent Neural Networks (ERNNs) and ARIMA models and applied the approach to Canadian Lynx data. In practical prediction, research methods are often composed of two types of models.

In this paper, three aspects are studied on trends of track cross level state changes. First, it analyzes track irregularity time series data and tries to find the connotative relationships between time series data with the application of seven gray incidence degree theories; secondly, it predicts long-term track level changes at fixed measuring point; finally, it predicts changes of tracks over time at unit section in short term. This paper modifies and corrects the inadequacies in the GM model, which can only reflect the state of development of the general trend other than reflect cycle and random variation of the changes of track level at the fixed measuring point. The accuracy of fitting and forecasting can be greatly improved. In terms of unit section track state study, this paper uses random linear AR model and Kalman filtering model to analyze track state over time as well as to predict its future state. By combining the above studies, we can see the statistical laws of track state changes in the long and short term and can forecast the future state of the track.

#### 2. Data Analysis

##### 2.1. Analysis of Track Irregularity Data

The idea of time series analysis has been applied in many areas of research, such as the relationship of following speed and spacing with driving time in driver’s safety-related approaching behavior [17, 18]. In track irregularity time series studies, the continuity of tracks leads to a great similarity between two random time series data obtained at two adjacent inspection points. The comparison of track cross level values at K550.00166 and K550.00191 mileage points is shown in Figure 1.

It can be seen through Figure 1 that data obtained between the two adjacent measuring points shares high similarity. There is a great inconsistency during the 23th, 24th, 25th, and 26th inspection at the two adjacent measuring points. It shows great changes on track state during this time period.

In terms of the complicity of the relationships of time series curves, it is not easy to find a standard or a fixed formula to indicate the time series curve, but it can only give a complex evaluation on the changes and a developing tendency of the time series data. As a result, this paper analyzes and compares seven incidence degree algorithms. Certain relationships exist between track irregularity time series. Seven incidence degree [19] formulas include displacement incidence degree (DID), absolute incidence degree (AID), improved absolute incidence degree (IAID), T incidence degree (TID), slide incidence degree (SID), first difference incidence degree (FODID), and second difference incidence degree (SODIG). These seven incidence degree are used to reflect the corelationship between time series curves. Table 1 shows, respectively, seven incidence degrees between actual cross level time series and reference cross level time series.

It can be found from Figure 2 that changes on two adjacent cross level irregularity state data show the linear trend, with an approximated slope of 1. If there is a large deviation from the slope, it will illustrate that the two adjacent inspection data on track cross level state have been changed greatly and are in need of special attention.

##### 2.2. Analysis of Track Irregularity Time Series

Track inspection data refers to the data obtained within a roughly fixed time interval (a half month), which is generated from geometry state detection along the mileage range of railway line. The time sequence of track geometry state changes with the following characteristics.

(1) Data Elements of Original Time Series Is a Data Set
In the study of variation law of detection data, each detection data on a certain unit of section area is considered as a data unit. Data sequence consisted of data unit within a certain time frame is the object of study, forming a time series. Original time series data is described as follows:
In the formula, is the prediction data set at the unit section, constituting a time series of data units, , is the time point in time series, is mileage, is gauge, is longitudinal level, is cross level (L), is cross level (R), is alignment (L), is alignment (R), and is twist.

(2) Data Transformation Is Necessary
Since each data unit is not a single data, but a data set of union section, rather than, therefore, it is necessary to transform processing in order to form data which can reflect the real characters of this section geometry state at :
In the formula, reflects the characteristics of the entire section of the track geometry at . After transformation, changes of time series data at unit section are shown in Figure 3.

(3) Time Series Data Are Small Data Sets
In order to keep track status in good condition and to ensure operation safety, maintenance at regular intervals is needed as the track state changes. Only data from two maintenance operations can be seen as the objects of the study as well as time series data. It also means that this is a small data set within a short timespan. We need to find an effective forecasting method to realize our research goal even though historical data is limited.

As shown in Figure 4, refers to track geometry state changes (deterioration) limits and and refer to the exact time that maintenance and repair operations occurred within. It shows a cyclical changing trend of track state conditions.

(4) Data Selecting
In this paper, track irregularity data by track inspection car in the experimentation is provided by State key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University. The cross level irregularity data is selected as the object of this research. The research selects the Beijing-Kowloon upline, the K550 + 000 to K550 + 075 mileage ranging from the second track inspection in late February 2008 to the second track inspection in late May 2009, a total of 31 inspection data as data object, each of which contains 300 cross level values and each data array contains 300 elements.

#### 3. Medium and Long-Term Track State Change Models

##### 3.1. Improvement of GM Model and Prediction

GM stands for a grey model of “” variable expressed by order differential equations. Generally speaking, when we make benefit analysis and production forecast in the field of economy and agroecology, we only work on a variable—a result, by then . But when is too large, it will be too difficult to calculate; thus model GM will be commonly used. G represents gray, M represents model, and GM stands for first-order, single variable grey model [1], which can be represented by the linear differential equation (3.1). GM model is usually used to predict growth trend sequences with power exponent and usually has better accuracy in prediction. However, in reality, in most cases, the series data does not show the exponentially growth trend, and generally they are outliers, which limit the range of applications and fields of GM . Thus, the model needs to be improved on the pretreatment of the raw sequence, so that it can expand the extent to which the model can be applied.

Track cross level irregularity data is a data set at a fixed measuring point, which fluctuates along mileage with zero values, and the data itself is not monotonic. The methods of the improved model are as follows: first, the fluctuating value of data is changed to zero by translating, and then a fixed positive constant is added to each data, so that the new time series data are positive. Next, smooth the new time series data using a power function . The result of new time series data weakens the impact of outliers on the fitted data. In this paper, the positive regular value is selected as the integer value of two times of the maximum absolute value among all original series of data elements, that is, .

According to the analysis of the track cross level sequence of raw data, we find that . So we select 15 as the positive constant value. According to the degree of the dispersion of the newly constructed data, , which ranges from 0 to 1, can be determined. Combined with data characteristics of cross level irregularity, we set . Reconstruct the original series by applying new methods of constructing series and then get the new series , adding up, AGO sequence is constructed:

When is on a point value , approximation taken in the point , that is,

After transformation, let us solve differential equations

Then we can obtain the coefficients and of the regression curve according to the least squares method. The expression resulted from the solutions of

Next, take the values of and into (3.4), we can obtain GM prediction model of the track cross level state changes. Because the calculation data is a data array which is added up based on a fixed value; the final predictive value of the expression is as follows:

With the application of the formula (3.5), we can predict the law of trend on historic track cross level data. The trend curve and actual curve fitting is shown in Figure 5.

When gray model GM is used to forecast time series data that after transformation (see Section 2.2) at the unit section, the predictive result is shown in Figure 6.

It can be found from Figures 5 and 6 that the GM predictive value curve is smooth and there is a larger deviation between the predicted values and actual values; so it can only reflect the overall trend, but cannot reflect the characteristics of the cyclical changes and random fluctuations and cannot be applied to forecast track state. Therefore, the GM model needs amended residuals to meet the forecasting requirements.

##### 3.2. Gray Model GM with Residual Modification

Since the residuals are large, there will be a great inaccuracy in GM when predicting the actual track state change trends. So we cannot predict the medium and long-term track state changes. In this paper, a method based on the trigonometric residual modification was presented to improve the predictive accuracy.

Time series of the track geometry state changes has cyclical characteristics according to the analysis of the historical changing trend of cross level. We find that trigonometric function has obvious cyclical features. In this paper, trigonometric function is used to correct residuals of the prediction model. Here, the residual refers to the actual value minus the predicted value, that is, . Set

In the formula, is the amplitude of wave mode, , is the cycle, is the inspection time interval sequence. Because of , so . One has

With the principle of the minimum cumulative error of the fitted values and actual values, combined with the application of trigonometric wave mode matching method, we try to make sure that the posteriori error is the smallest and the small probability is the largest, and then we obtain . At the same time, the amplitude of wave mode calculated by the formula (3.7) is .

Take and into (3.6); then we obtain the revised residuals’ formula:

Combined with residual formula and the formula (3.5), the final forecast expression after residuals adjustment is

Let us predict track cross level state with the formula (3.9). The predicted values and actual values are shown in Figure 7.

When gray model GM after residual modification is used to forecast time series data (see Section 2.2) at the unit section, the prediction formula is (3.10), and the prediction result is shown in Figure 8:

As can be seen from Figures 7 and 8, compared to the original forecasting trend curve, the modified forecasting trend curve is much closer to the actual value. It has a better degree of fitting and can reflect the cyclical changes of the track cross level state. Therefore, the revised model can be applied to forecast the future track cross level state trends in the medium and in the long term.

In gray forecasting, the prediction with good fitting and extrapolation leads to a smaller value and a larger value . It shows a large probability of small error and high accuracy in prediction [20]. According to the statistical theory, we examine the accuracy of prediction on track state by using posteriori error and the small probability and then make a comparison between it and the predictive accuracy of original GM model. See Table 2.

Through comparative analysis, the variance ratio of posteriori error of GM model after the residual modification is significantly smaller than the original residual model; thus the fitting and extrapolation of the modified model have changed for the better, and the predictive accuracy is improved.

#### 4. Short-Term Prediction Models of Cross Level State Change

##### 4.1. Prediction Based on AR Model

Track cross level irregularity time series data is smooth and consistent with the characteristics of the stationary random sequence; so there is no need to eliminate the trend of the differential operator. Although there is no definite model in track state changes in the long run, the state change in a short period can still be considered as close to the linear model. In order to study the unit section of the overall level of state which changes over time, it is considered as one-dimensional array data which contains 300 data at a select unit section. The track cross level irregularity time series data is

Then, , time series is generated with mean value zero, , and is the sample mean value of .

Autocovariance function refers to the random signal between the values of two different moments of the second-order mixed central moments. Autocorrelation function depicts the incidence degree between adjacent variables of time sequence. The partial autocorrelation function was excluded from the impact of other intermediate variables; the two functions are closely related and can reflect the true incidence degree between two variables [21]. The value of sample’s autocovariance function , incidence degree function , and partial autocorrelation function are shown in Table 3.

It can be seen from Table 3 that when becomes greater, the previous four , absolute value of , are getting smaller and smaller. Therefore, we can see that the autocorrelation function is tailed. When , there is at most one that can make . Therefore, the sample partial incidence degree function is truncated; partial autocorrelation function is truncated at the point at which .

Through comprehensive analysis, the prediction model is defined as AR . We can calculate the parameter’s estimates using Yule-Walker equations.

We get , , , and .

Thus, the AR model is

In the formula, , is the random disturbance error, which is white noise sequences with zero mean value, normally variance, nonzero, unrelated, and independent.

Taking estimated value on both sides of formula (3.8) and then take the estimated parameter into the formula, we can get the AR prediction formula:

In formula, when , .

The predictive results of cross level irregularity data at late June 2009 and actual test data are shown in Figure 9.

By contrasting the forecasted data with the actual inspection data, it can be found that the distribution characteristics of actual value and the predictive value can agree with each other well, and the data curves roughly coincide with each other.

##### 4.2. Prediction Model Based on Kalman Filtering

Kalman filtering can be used to estimate the current state when the estimated state from the last time and the current state are known, needless to know historical information observations or estimates. In the absence of maintenance, changes of track geometry are closely related to the passing gross weight change; deviation of track geometry will be further from the standard value with the increase in gross weight; track geometry status is also affected by the impact of train speed. The higher the speed is, the greater force is exerted on the track and the greater influences on the track geometry status are. The track geometry (detection data), passing gross weight change, and train speed are used as the technical indicators for track state prediction, and the accumulation and analysis of historical data can be used for building track state prediction models.

With the application of Kalman filtering algorithm, in the track inspection data analysis and forecasting models, is actual value of track inspect items; is the transfer matrix of the actual value; is track inspection car’s detecting value; is the observation matrix; is process noise , which is the deviation of the track state changes; is measuring noise , which is white Gaussian noise; is the predictive value of the state of the track geometry. The prediction formula of Kalman filtering is as follows:

In the formula, is prediction value, is the minimum mean-square deviation under the revised matrix, is transfer matrix, is an observed value and is an estimated value, and is the measured matrix.

Kalman filter model is applied to forecast the cross level status the next time when testing. The comparison of detection cross level value and the prediction value is shown in Figure 10.

##### 4.3. Prediction Model Based on Artificial Neural Network

Artificial neural network (ANN) is widely used in function approximation, pattern recognition, and data compression [2224]. It is the best method compared with other traditional models, because it has better durability, timely forecasts, highly nonlinear, and strong self-adaptive learning ability. Usually, the network has an input layer, an output layer, and a hidden layer. ANN has advantages such as the following. Network’s input and output can be achieved in any nonlinear mapping as long as there are enough hidden layers and hidden nodes. The relationship between input nodes and output nodes of ANN is as follows [10]:

In the formula, is a hidden layer transfer function, is actual output, is connection weights, is the number of input nodes, and is the number of hidden nodes. The neural network model performs a nonlinear functional mapping from the past observations to the future value , that is,

is a vector of all parameters and is a function determined by the network structure and connection weights. Because track state changes are nonlinear and ANN has a flexible capability in nonlinear modeling, ANN is applied to forecast the track data change. The forecasting result is shown in Figure 11.

##### 4.4. Comparison of Three Prediction Models

The specific error distribution of AR model, Kalman filtering model, is ANN model are shown in Table 4.

It can be seen from Table 4 that the predictive accuracy of AR and Kalman filtering models is similar, and the predictive accuracy of ANN model is slightly higher than the previous two.

#### 5. Conclusions

After the comprehensive assessment of the incidence degrees of track irregularity between various indicators of factors, we find that when the associated values are higher, these correlated time sequences will normally have a higher degree of factors correlation or processes correlation. Meanwhile, the calculated results of incidence degree will be in a good agreement with the actual situation, which will provide a reliable basis for choosing modeling variables and analyzing factors. Improved GM model based on features of track cross level data can predict track state development and changes at fixed measuring point in the medium and long term. Fitting curve can reflect the cyclical changes of cross level state over time by residual modification. Statistical validation shows that the posteriori error values in improved model which was corrected with residuals will be reduced down from 65% to 43%, compared to the original model. It reflects the changes of cross level state more accurately. Random linear AR model, Kalman filtering, and ANN are used to predict the state changes of union section in short term. The results show that the accuracy of ANN is slightly higher than AR model and Kalman filtering, and the combination of the four models together constitutes the research of long-term and short-term track state changes at fixed measuring point and union section.

#### Acknowledgments

This research was supported by the National Natural Science Foundation of China (General Projects) (Grant no.: 61272029), National Key Technology R&D Program (Grant no.: 2009BAG12A10), China Railway Ministry Major Program (2008G017-A), and State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University (Contract no.: RCS2009ZT007).