Abstract

To improve the quality of track maintenance work, it is a desire to estimate vehicle dynamic behavior from track geometry irregularities. This paper proposes a deep learning model to predict vehicle responses (e.g., vertical wheel-rail forces, wheel unloading rate, and car body vertical acceleration) using deep learning techniques. In the proposed CA-CNN-MUSE model, convolutional neural networks (CNNs) are used to learn features of track irregularities, and multiscale self-attention mechanisms (MUSE) are employed to capture the long-term and short-term trends of sequences. Coordinate attention (CA) is introduced into CNN to focus on important interchannel relationships and important spatial mileage points. The experiments were performed on a multibody simulation model of the vehicle system and the measured data of the actual high-speed line. The results show that the CA-CNN-MUSE has high prediction accuracy for vertical vehicle responses and fast computation speed. The predicted time-domain waveforms and power spectral densities (PSDs) agree well with the actual vehicle responses. The main features of the lateral vehicle responses can also be captured by the proposed method, yet the results are not as good as the vertical ones.

1. Introduction

Assessment of track quality for track maintenance work is critical to ensure train running safety and passenger ride comfort in high-speed railways. The current evaluation method of track quality is based on the amplitudes of track geometry irregularities. However, it is not enough to use the single-track irregularity index to evaluate the track quality without considering vehicle dynamic response. For example, some problems may occur in some track sections, where the amplitude of each track irregularity does not exceed the limit but a large vehicle vibration response appears. Conversely, there are also some track sections where the amplitudes of some track irregularities exceed the limits but do not cause poor vehicle response. These problems show that the vehicle vibration response is the result of the nonlinear coupling of multiple track irregularities.

To improve the track quality assessment standards and the track maintenance work, research has emerged to relate track geometry irregularities to vehicle response. The key to these works is to find a model to accurately estimate the vehicle response from track irregularities. Then, the track geometry assessment can be performed by combining the indexes of actual track irregularities and the predicted vehicle responses. The models are not only helpful for the track maintenance department to find the track locations that cause undesirable vehicle responses and estimate the degradation trend of the track but also helpful for facilitating the upgrades of the vehicle design technology.

There are several ways to build the model to predict vehicle response from track geometry irregularities. Some researchers focus on building a mechanical model to simulate the nonlinear dynamic behavior of the vehicle [13]. Using commercial software such as SIMPACK, researchers can build a 3D vehicle-track dynamic model to investigate the relationship between track irregularities and the dynamics performance of railway vehicles [46]. However, the performance of the mechanical model depends on the reliability of model parameters and is easier to be affected by variations in reality. Therefore, putting a theoretical model into track maintenance practice is difficult because the actual parameters of a vehicle-track system are difficult to obtain and change with time. Besides, the numerical iteration method used in solving a mechanism model is time-consuming.

Some researches characterize the behavior of the vehicle-track system as linear transfer functions and estimate parameters based on system identification theory [79]. However, system identification theory can be applied to only the linear system and only constant speed conditions.

The alternative approach is to carry out a prediction of pure data using machine learning techniques. The machine learning method has high computational efficiency, good performance in modeling nonlinear mapping relationships and can be designed for varying speeds. Machine learning methods can be further categorized into traditional methods and deep learning methods. Traditional machine learning approaches predict vehicle responses including multilayer perception (MLP [10]), a set of backpropagation (BP [11]), decision tree, support vector machines, and other regression algorithms and their comparisons in [12], and NARX neural networks [13].

In recent decades, deep learning has achieved great success in sequence modeling. The main advantages of deep learning are strong nonlinear modeling ability, end-to-end training, and eventually great improvement in model accuracy. The classical sequence modeling methods in deep learning are recurrent neural networks (RNN [14]), long term short memory (LSTM [15]), and gated recurrent (GRU [16]), and their combinations with convolutional neural networks (CNN). Li et al. used LSTM to estimate car body vertical and lateral acceleration based on the simulation model [17]. Ma et al. proposed a CNN-LSTM model to predict car body vibration acceleration from track irregularities [18]. The model uses two layers of CNN to learn different band features of sequences and feeds the extracted features into two layers of LSTM to learn the mapping relationship between the input and the output. The accuracy of the model is superior to BP and LSTM.

Recently, with the emergence of the attention mechanism (AM [19]), deep learning has stepped into a new stage. The main idea of attention is to tell a model to attend “what” and “where” to focus on the most relevant information instead of the entire sequence. The traditional RNN and LSTM models are replaced by a multihead self-attention mechanism, which achieves better accuracy and allows more parallel computation in machine translation tasks [20]. With its success in sequence-to-sequence tasks, the self-attention mechanism has become a standard component for capturing long-term dependence. However, the self-attention mechanism also has shortcomings. With the deepening of self-attention levels, a certain input vector will be paid too much attention, resulting in insufficient use of local information. Therefore, Zhao et al. replaced multihead self-attention with multiscale attention (MUSE) to encode global and local relations in parallel.

Attention mechanisms are widely deployed for boosting the performance of modern deep neural networks [2126]. Unlike channel attention, it only considers interchannel information but neglects the importance of positional information. Hou et al. proposed a novel lightweight coordinate attention for mobile networks [27], which can capture important interchannel relationships and precise positional information meanwhile.

In this paper, we established a CA-CNN-MUSE model, which combines CNN with MUSE and introduces coordination attention into CNN to estimate vehicle response. The experiments show that the model improves estimation accuracy, compared with LSTM and CNN-LSTM models and has good computation speed.

2. The Proposed Model

2.1. Dataset

In this paper, we use two datasets to investigate the proposed model. The first dataset is the “Inspection-Simulation” dataset, which means inspected track irregularities by high-speed comprehensive inspection train (see Figure 1) in China and simulated vehicle responses by a multibody model of the vehicle. The multibody dynamics (MBD) software SIMPACK is employed for the simulation of vehicle-track interaction. The CRH380B EMU trailer is modeled, in which the car body, bogie frames, and wheel sets are simplified as a rigid body and are connected using the primary and secondary suspensions. The second dataset is the “Inspection” dataset, which means that all the track irregularities and vehicle responses are inspected by track comprehensive inspection trains in actual high-speed railway lines. The items of track irregularities and vehicle responses of the two datasets are different.

The inspection-simulation dataset includes 4 track irregularities, i.e., left and right longitudinal levels and left and right alignment, coming from 3 Chinese high-speed railways, of which the mileage of line 1, line 2, and line 3 is 320 km, 120 km, and 80 km, respectively. The vehicle response data includes 14 quantities: wheel-rail force (left and right vertical forces of 1-axel, 2-axle, 3-axle, and 4-axle), wheel unloading rate (1-axle, 2-axle, 3-axle, and 4-axle), and vertical car body accelerations (front and rear).

The inspection dataset was extracted from the database of track comprehensive inspection trains on a high-speed line in China. The whole mileage is 17.5 km, the spatial sampling interval is 0.25 m, and the vehicle speed range is from 215 km/h to 245 km/h. The track irregularity includes 11 quantities: left and right longitudinal levels, left and right alignment, left and right long-wave longitudinal level, left and right long-wave alignment, gauge, cross-level, and twist. The vehicle response includes 6 quantities left and right vertical forces, left and right lateral forces, and vertical and lateral car body accelerations.

2.2. Model Structure

The prediction framework of vehicle dynamic response is established as shown in Figure 2. The network mainly consists of three main modules: CNN, coordination attention (CA), and multiscale attention (MUSE). CNN is composed of alternately stacked convolution layers and pooling layers, which are used to extract different features of track irregularities. The results of CNN are fed into two stacking multiscale attention layers, each of which is composed of a multihead self-attention mechanism and depth-wise separable convolution to encode global and local relations in parallel. Coordinate attention [27] is added to CNN to focus on important feature channels and important mileage points. There are fully connected layers before and after the multiscale self-attention module to perform nonlinear mapping and change the dimensions, and the last fully connected layer outputs the estimated vehicle responses.

2.2.1. CNN Module

Since the maximum management wavelength of track irregularity is 120 m, to fully capture long-distance wavelength information, we take all track irregularity data within 120 m as input. We use CNN to learn the features of input vectors and fed the features into the multiscale attention layer to estimate the vehicle response at the current mileage point. Assuming the current mileage point is t, let us estimate the vehicle response at T mileage points at a time

The input should bewhere , , is the C-dimensional vector, is the K-dimensional vector, L is the number of mileage points within 120 m. The size of the input sequence is T × L × C, and the size of the output sequence is T × 1 × K. For the inspection-simulation dataset, C is 4 and K is 14. For the inspection dataset, C is 8 and K is 6.

The structure and parameters of CNN are as same as that in reference [27]. It includes two convolution layers (Conv1D), where the numbers of convolution cores are 4 and 8 respectively, and the size of the convolution core is 1 × 5, and the step is 1. Two max-pooling layers are used with 1 × 2 pool core size and step 2. After two stacking operations of convolution and max-pooling, the multidimensional features of track irregularities are extracted. The flattened layers compress the multidimensional feature vectors into one-dimensional feature vectors to get global features.

2.2.2. Multiscale Attention

The diagram of multiscale attention is shown in Figure 2. This module consists of two parts in parallel, a multihead self-attention mechanism for capturing global features and a depth-wise convolution mechanism for capturing local features. For the input sequence x, the output y passing through the multiscale attention layer can be expressed as follows:

The multihead self-attention mechanism is responsible for learning representations of long-term dependencies. In this module, the input sequence X is projected into three representations, query Q, key K, and value V [20].

Then, the output representation is calculated as follows:where , , , and are projection parameters, .

The convolution module is used to capture the local contextual sequence representations in the same mapping space. Based on the depth-wise convolution operation, the module uses three convolution submodules, which contain multiple cells with different kernel sizes of 1, 3, and 5 to capture the different range features. A gating mechanism is introduced to automatically select the weights of different convolution cells to converge the information of different convolution submodules. The depth-wise convolution will first perform independent convolution on each channel and then perform ordinary convolution. The calculation process of the convolution module can be expressed as follows:where is the weight coefficient; the output of the convolution cell with kernel size is as follows:

2.2.3. CA Module

Coordinate attention is added in CNN to focus on the important positions in the spatial dimension L’ and the important channels in the channel dimension C’, which have important impacts on the outputs. The input dimension of the coordinate attention is T × L’ × C’. The size after 1D average pooling in the T dimension is 1 × L’ × C’. 1 × 1 convolution is used to reduce and increase the dimension of the channel, where r is the reduction ratio. At last, the coordinate attention weight generates through the sigmoid function.

3. Evaluation and Analysis

3.1. Experimental Setup

For each dataset, the division ratio of training and test data is 7 : 3, which is a common setup in deep learning. In the training process, the loss function is the mean square error of the actual and the predicted vehicle response, adding the L1-norm and L2-norm regularization of model parameters.where T is the sequence length, is the all-trainable model parameters, and are regularization coefficients. The learning rate is 0.001 and the optimizer is the Adam algorithm.

3.2. Evaluation Indices

Four evaluation indices for model performance are employed, including the mean absolute error (MAE), root mean square error (RMSE), theil inequality coefficient (TIC), and the correlative coefficient (). The indices are defined as follows:where M is the length of test data; and are the actual and predictive values; and are the expectation of the actual value and the predicted value of the model, respectively. MAE and RMSE reflect the absolute accuracy of the prediction, and the smaller their values, the better the performance of the model. TIC and represent relative accuracy indices. Smaller TIC (ranging from 0 to 1) means higher accuracy. ranges from −1 to 1, and the greater the absolute value, the higher the accuracy.

3.3. Comparison with Other Models

To evaluate the proposed model CA-CNN-MUSE, we build multiple models and show the test results of the inspection-simulation dataset of line 1, as shown in Table 1. There, the LSTM model has two stacking LSTM layers, which all have 64 hidden neuron nodes. The size of the input sequence is B × 480 × C, and that of the output is B × 480 × k, B = 32 is the batch-size. CNN-LSTM network has the same CNN module as that of the proposed CA-CNN-MUSE and two stacking LSTM layers. CA-CNN-LSTM is the network when adding the CA module into the CNN-LSTM. CNN-MUSE network replaces the LSTM with MUSE.

For comparing the performance of different models, the accuracy indicators of each vehicle response are averaged to evaluate the model accuracy as a whole and more indices are used, including params, FLOPs, and inference time on the test set. The following can be seen from the table:(1)When CA is added to the CNN-LSTM model, all the accuracy indices of RMSE, MAE, , and TIC improve. Replacing LSTM with MUSE in CNN-LSTM, all the accuracy indices also become better. Using CA and MUSE at the same time, CA-CNN-MUSE reaches the best RMSE, MAE, and TIC.(2)Replacing LSTM with MUSE in CNN-LSTM, although the params and FLOPs increase, the inference time is reduced. The reason is that the multihead attention module in muse can be computed in parallel mode. When the CA module is added to CNN-MUSE, the inference time only increases by 0.04 s.

To sum up, the evaluation indices of CA-CNN-MUSE are superior to that of other models.

3.3.1. Results on Different Lines

Based on the inspection-simulation dataset, we investigate the model estimation performance on different lines, as shown in Table 2. As can be seen, compared with CNN-LSTM, CA-CNN-MUSE also has better performance online 2 and line 3.

3.4. Results of Specific Vehicle Responses
3.4.1. Results on the Inspection-Simulation Dataset

The accuracy indices of 14 vehicle responses of line 1 in the inspection-simulation dataset are shown in Table 3. The following can be seen:(1)CA-CNN-MUSE model can effectively estimate vertical wheel-rail forces, wheel unloading rate, and vertical car body accelerations, and the corresponding are 0.87, 0.73, and 0.96, respectively. Among these, the estimation accuracy of vertical car body acceleration is the highest.(2)In vertical wheel-rail forces, the right vertical force of axle 1 has the highest estimation accuracy. The estimation accuracy of the wheel unloading rate of the four wheel sets is the same.

3.4.2. Results on the Inspection Dataset

We also train and test the proposed CA-CNN-MUSE on the track inspection dataset. The 11 track irregularity items and the vehicle running speed are used as the input of the network. We first remove the wavelength components below 2 m from the signals using wavelet decomposition and reconstruction, then feed them into the network to estimate the vehicle responses. The following results are obtained on the test set:

Table 4 summarizes the accuracy indices of the CA-CNN-MUSE model. As can be seen, the CA-CNN-MUSE model can effectively estimate vertical wheel-rail force and acceleration. The estimation accuracy of vertical vehicle response is good. Compared with the vertical responses, the estimation of the lateral vehicle responses is not good enough.

To further analyze the model performance on different wavelengths, the waveforms of the right vertical force and vertical car body acceleration are illustrated in Figures 3(a)-3(b), the waveforms of the right lateral forces and lateral car body acceleration are illustrated in Figures 3(c)-3(d), where the length of the track section is 250 m. Similarly, the proposed model succeeds in predicting the vertical forces and vertical car body acceleration. Meanwhile, the main features of the lateral vehicle responses can also be captured by the proposed method, yet the results are not as good as the vertical ones.

The power spectral density (PSD) of the right vertical forces and car body acceleration are illustrated in Figures 4(a)-4(b), and the power spectral density of the right lateral forces and car body acceleration are illustrated in Figures 4(c)-4(d). As can be seen, the predicted PSD of right vertical forces and car body acceleration has good fitting effect at the wavelength above 3 m.

4. Conclusion

To relate track geometry irregularities with vehicle responses to improve track quality assessment standards and track maintenance work, this paper proposed a CA-CNN-MUSE model to predict vehicle response. We use a multiscale self-attention mechanism to replace the LSTM structure, which is dominant in this kind of task. Besides, we introduced the light weight coordinate attention mechanism into CNN to focus on important interchannel relationships and important mileage information. The results of this paper show the following:(1)CA-CNN-MUSE has higher prediction accuracy than LSTM and CNN-LSTM and a faster inference speed than CNN-LSTM. The estimated waveforms and PSDs of vertical wheel-rail forces and car body acceleration by CA-CNN-MUSE agree well with the actual items;(2)CA-CNN-MUSE is applicable to the multibody simulation model of a vehicle system and the measured data of actual high-speed lines.

The proposed model succeeds in representing the vertical wheel-rail force and car body acceleration, yet the estimations on the lateral wheel-rail force and car body acceleration are not as good as the vertical results. In fact, the estimation of lateral wheel-rail force and car body acceleration is indeed a difficult problem, and we will continue to investigate the challenge.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was supported by the National Natural Science Foundation of China (No. 52278465) and the Scientific Research Plan of China Railway (No. P2021T013).