Abstract

It is very important for the normal operation of high-speed trains to assess the health status of the running gear system. In actual working conditions, many unknown interferences and random noises occur during the monitoring process, which cause difficulties in providing an accurate health status assessment of the running gear system. In this paper, a new data-driven model based on a slow feature analysis-support tensor machine (SFA-STM) is proposed to solve the problem of unknown interference and random noise by removing the slow feature with the fastest instantaneous change. First, the relationship between various statuses of the running gear system is analyzed carefully. To remove the random noise and unknown interferences in the running gear systems under complex working conditions and to extract more accurate data features, the SFA method is used to extract the slowest feature to reflect the general trend of system changes in data monitoring of running gear systems of high-speed trains. Second, slowness data were constructed in a tensor form to achieve an accurate health status assessment using the STM. Finally, actual monitoring data from a running gear system from a high-speed train was used as an example to verify the effectiveness and accuracy of the model, and it was compared with traditional models. The maximum sum of squared resist (SSR) value was reduced by 16 points, indicating that the SFA-STM method has the higher assessment accuracy.

1. Introduction

With the continuous improvement of the safety and stability of high-speed trains, the study of the assessment of the health status of the running gear system has received extensive attention in recent years [13]. As a critical component to withstand and transmit various loads from the vehicle body and route, while mitigating its dynamic effect, the running gear system is quite prone to failure after a long period of high-speed operation. This requires a health status assessment of the running part to improve its safety and reliability. However, there are many unknown interferences and many noises in the working environment of the running gear system; thus, it is difficult to accurately assess the status of the running gear system by only using the original monitoring data. Therefore, to reduce the influences of unknown interference and noise under complex conditions and to enhance the accuracy of the assessment, in this study, we designed a data-driven health assessment model to ensure the safety, stability, and reliability operation of running gear systems.

The running gear system of high-speed trains is a complicated electromechanical system composed of many components. Any component may suffer from cracks, corrosion, leakage, and other faults, which can cause its faster degradation [46]. The extrusion wear of the wheel pair and the stiffness degradation of the steel spring will also lead to the degradation of the running gear system. Additionally, the monitoring data collected by the sensor under interference conditions contain noise, which will have a great impact on the health status assessment of the running gear system. Therefore, a health assessment of the running gear system has the following two characteristics:(1)There are a lot of random noises caused by sensor factors and unknown interferences caused by internal or external environmental factors in the monitoring data of the running gear system(2)Structural complexity due to the close coupling of components in the running gear system is difficult to be described by precise mathematical models through mechanism analysis, which will limit the use of analytical models

Currently, the common methods for health status assessment of complex electromechanical systems are mainly divided into three categories: the method based on semiquantitative information [79], data-driven method [1013], and model-based method [1416]. With an increase in the number of sensors in the running gear system, it has become very easy to obtain a large amount of data that can reflect the actual status of the system. Additionally, with the continuous development of feature extraction methods in recent years, data-driven methods have been able to extract feature information from massive amounts of data. Currently, many experts and scholars use data-driven methods to assess the health status of complex electromechanical systems. In [17], studies have combined the odd-even space method with the recursive least square algorithm to propose a fault detection and assessment method for a quadrotor UAV based on a linear time-varying system. With the analytic hierarchy process (AHP), which is a kind of traditional decision-making assessment method [18], Qian et al. [19] have suggested a kind of electric power dispatching control scheme based on health status assessments, and through the AHP, this was carried out to perform a wind turbine health status assessment. This method can improve the operation efficiency of a wind farm to the greatest extent and reduce the fatigue load of the fan fault. Through continuous improvement of the hidden Markov model (HMM) [20, 21], Liu et al. [22] proposed a discrete hidden Markov model (DHMM) based on K-means clustering, and the K-means clustering algorithm was adopted to filter the sample points that are inconsistent with the actual class labels, which could better detect and isolate faults. Recently, deep learning technology has been widely used in status assessment [23, 24], and Liang et al. [25] proposed a convolutional neural network (CNN) method, which can be applied for status assessment of a gearbox. Using a massive amount of training data, a high assessment accuracy can be achieved. However, none of the above methods take into account the problem of accurate extraction of original data features. Too many data features will generate redundant variables and increase the complexity of the algorithm, thus affecting the accuracy of the assessment model.

For this reason, the Hilbert–Huang transform (HHT) and the support vector machine (SVM) have been used in the engine fault intelligent diagnosis method (EFD), which uses HHT to extract features and effectively and provides engine fault diagnosis [26]. Song et al. [27] combined statistical filtering (SF) and the wavelet packet transform (WPT) to propose a new method of signal feature extraction and fault diagnosis for a low-speed mechanical system. Principal component analysis (PCA) is the most classical method used for feature extraction [28], and one study constructed subspaces in different directions using PCA analysis of the principal components, to divide the original feature space into several subspaces automatically, and developed a monitoring and assessment scheme for the model. Moreover, Jiang et al. [29] presented a distributed fault detection and isolation method based on fault-related variable selection and Bayesian reasoning. They use an optimization algorithm to determine the optimal subset of variables for each fault, build a sub-PCA model in each subset, and combine the monitoring results of each subset through Bayesian reasoning. This method significantly reduces the redundancy and complexity and thus improves the monitoring performance. To sum up, there are many status assessment methods based on data that have improved the assessment accuracy after feature extraction, but, for actual work environments, there are many unknown disturbances combined with the considerable noise from the complex electromechanical systems. The traditional feature extraction method is difficult to carry out accurately for feature extraction, which affects the accuracy of the assessment model.

Therefore, a new process monitoring method based on SFA was applied in one study [30, 31] to extract the feature with the slowest change from the original monitoring data, and the process monitoring based on slow data was used to distinguish the deviation between dynamic anomalies and normal operating conditions. The purpose of the SFA method in data processing is to find mapping functions of instantaneous scalar input and output from multidimensional monitoring data so that the output slowness data changes as slowly as possible. Additionally, it carries information reflecting the general trend of the system change and filters the slowness data according to the slowness. The filtered data often represents the short-term noise of complex system changes, and the remaining data more accurately reflect the general trend of system change. Therefore, this paper proposes a SFA-STM method to assess the health status of the running gear systems of high-speed trains. The STM is a classification algorithm based on tensor data, which is very suitable for dealing with nonlinear and non-Gaussian problems. In [32, 33], bearing monitoring data was constructed into a tensor matrix and the method based on STM was applied to fault diagnosis of the bearings. Because the tensor data form retains the space and time form of the original data, the data are fully utilized, which can prevent information loss caused by the multidimensional feature of vectorization, providing a good diagnosis accuracy. In [34], the authors proposed a method based on a hybrid support tensor for the diagnosis and positioning of open circuit faults of modular multilevel converters, and the classification accuracy was better than that of the support vector machine. Therefore, in this study, we used SFA to extract the slowness data from the monitoring data of the running gear system and to eliminate the noise data. Then, the slowness data reflecting the general trend of the process change was constructed into a tensor matrix and input into the STM model, to enhance the accuracy of STM for the health status assessment of the running gears of high-speed trains. This solves the problem of the many unknown interferences and random noises in monitoring data under actual working conditions, which affect assessment results.

This paper is arranged as follows. Section 2 introduces the structure of the running gear system of a high-speed train and describes the problem. In Section 3, the health assessment model for the running gear system in a high-speed train based on SFA-STM is proposed. Section 4 describes a practical case to verify the method proposed in this paper. Section 5 provides the conclusions of this paper.

2. Preliminaries

This part briefly introduces the running department and describes the problems to be solved.

2.1. Description of Running Gear System

As shown in Figure 1(a), the running gear system is a complex coupling system composed of multiple components that is located between the high-speed train body and the track and pulls the vehicle along the track. As the core component of a high-speed train, it is mainly composed of a frame, axle box, suspension device, driving device, brake device, and sensor. During the manufacturing phase, many sensors are preinstalled in the running gear system to monitor the status of the running gear system. Each sensor is integrated with multiple sensor units to monitor different physical quantities and collect different types of data, such as temperature, vibrations, and impact. Therefore, a single sensor is also called a composite sensor component. A total of 11 composite sensors are installed in the monitoring bearing support area, and the installation direction should be consistent with the direction of the impact signal. The specific distribution location is shown in Figure 1(b). A1–A4: measurement point of the axle box bearing. B1–B3: motor bearing measurement points (2 is the measurement point of the motor drive bearing and 3 is the measurement point of the motor rotor). C1–C4: gearbox measurement point (3 and 4 are measuring points of small and large gears at the motor end). The distribution of sensors is very complex, and the many sensors can easily obtain considerable monitoring data reflecting the status of the running gear system during the process of train operation. However, because of the complex monitoring environment, the considerable noise and interference factors cannot be ignored; thus, it is particularly important to remove noise and interference from the monitoring data.

To better determine the health status of a running gear system, it is necessary to monitor the temperature, vibrations, impact, speed, and other physical quantities of the running gear system. For this reason, in this study, we analyzed three subsystems of the running gear system, including the axle box, gearbox, and motor, and we evaluated the status of the running gear system by using the monitoring data from these three subsystems.

2.2. Problem Description

When a high-speed train runs at high speed for a long time, the gearbox bearing is prone to cracks and deformation. Additionally, the wheelset of the running gear system will be squeezed and worn when it touches the ground for a long time, and this will be corroded by different environments while staying at a high-temperature status [35]. The air spring also has the possibility of gas leakage, and the stiffness degradation of the shock absorber steel spring can occur. All these factors will cause degradation of high-speed trains. From the distribution of a large number of complex sensors, the running gear system status can be determined based on monitoring data. However, because the complex engineering monitoring environment and running environment are complex, the monitoring data is mixed with large amounts of random noise and uncertainty owing to disturbances. This seriously affects the monitoring data, affecting the accuracy of direction of actual running status, and there are two key reasons for this situation [34]:(1)Environmental factors: this is mainly due to the operation environment of high-speed trains and the sensor environment. In the high-speed running of high-speed trains, it is inevitable that they will be affected by road condition factors, interior factors, climate factors, and other disturbances, which cause the sensor monitoring data to fail to reflect the actual status of the running gear system over a short period. This can affect the accuracy of the status assessment of a running gear system.(2)Sensor factor: this is mainly determined by the sensor’s principle, material, manufacturing process, and other physical characteristics. After long working hours, the quality and function of a sensor will be reduced to different degrees, and a large amount of random noise will be generated, which will continuously affect the monitoring performance. As a result, some errors in the information will be recorded, and accurate monitoring data cannot be obtained, which will seriously affect the accuracy of the status assessment of the running gear system.

According to the above analysis, monitoring data are mainly affected by these two factors, while the traditional feature extraction method has difficulty in removing the influence of uncertainty disturbances and random noise. Thus, a new approach is therefore required to address this issue.

3. Methodology

To accurately assess the health status of high-speed train running gear system, a health status assessment model based on SFA-STM is proposed. As shown in Figure 2, the model structure is mainly divided into three parts. First, the features reflecting the general trend of system changes are selected as the input for the SFA. Second, the monitoring data processed by SFA is constructed into a tensor matrix. Third, the STM health status assessment model is constructed using the tensor matrix.

3.1. SFA

Data reflecting the changing trend of high-speed train running gear systems is covered by random noise and unknown interference, and when these data are used to assess the health status of a running gear system, it will seriously affect the accuracy of the assessment results. Thus, SFA is the most effective method to reduce noise and interference.

The purpose of SFA data processing is to determine the instantaneous scalar input-output mapping function from the multidimensional time input signal so that the change in the output signal is as slow as possible while carrying the information reflecting the general trend of the system change. Mathematically, the purpose of an SFA is for a given -dimensional input signal:

Find the eigenfunction of a series of slow feature sets so thatunder constraintto minimize, where is a series of slow features, and is a measure of slowness of . is the time average, and is the first derivative of with respect to time.

For SFA, each slow feature is a linear combination of input variables:where represents the coefficient vector. The mapping from to can be abbreviated as follows:where is the coefficient matrix to be optimized by SFA.

Substitute equation (6) into constraint (3), and we obtain

If the input variable is scaled to a zero mean ahead of time, constraint (3) is automatically satisfied.

For the solution of slow feature , the following steps are required, after the singular value decomposition (SVD) of the covariance matrix , and it can be obtained:

Next, the whitening transformation can be expressed aswhere is the whitening matrix. You can easily get and , so the purpose of the SFA is to further translate into finding a matrix that satisfies because

Then, it can be seen that constraints (4) and (5) can be simply written as follows:

Take equation (11) and bring it to equation (12):so we know that is an orthogonal matrix. Therefore, the optimization problem of SFA can be simplified to find the orthogonal matrix , thus minimizing to . Such problems can be solved by SVD of the covariance matrix :

Therefore, the orthogonal eigenvector and the corresponding eigenvalue can be obtained and verified:

Finally, the change matrix can be calculated asso the slow feature can be calculated as

The statistical characteristics of the slow feature can be expressed by the following equation:

According to different slowness features, the slowness features are extracted according to the following equation:where represents the number of elements in a set and is the slowness measure. Set , and , here as the upper quantile of the set , is an evaluation criterion for the slowness of , which can filter out slowness of the largest , expressed as . These slow features reflect the general trend of system changes in the running gear system. For the slow features screened out, they represent the short-term noise and unknown disturbance of the system changes. The above proofs have been given in [36].

3.2. STM Model

Using the SFA model in the previous section, data reflecting the general trend of the system change of the running gear system was successfully extracted. Then, the nonlinear model between the input and output needs to be built by the STM, to achieve an accurate assessment of the health status of running gear systems of high-speed trains.

Consider a dichotomy problem, where the training set iswhere represents the number of samples, represents the input data, represents the class label of the data, and . The optimization problem of the STM model is

Based on the idea of supervised tensor learning framework, the STM dichotomy model is decomposed into suboptimization problems, and the expression form of one of its suboptimization problems after fixing and is

In equation (22) above, is the weight of the hyperplane, is the offset, and are the relaxation variables, and is the penalty factor. What we need to solve are optimization problems with a similar form to equation (22); for these problems, we can solve them through the optimization method of alternating projection and finally obtain the decision function of the model as

The alternating projection algorithm requires the input tensor to train the sample set , the iterative control threshold , and the output , .Step 1: is initialized as the unit vector in , .Step 2: iterate through steps 3 and 4 until the algorithm converges.Step 3: iterate for , fix , , solve equation (22), and get .Step 4: stop the calculation ifis met. Otherwise, go to step 2, where and represent the values of the previous iteration step and the current iteration step corresponding to the projection weight vector, respectively.

For the STM multiclassification model, the running gear system status label is defined, and the label is composed of information classes. One-against-one (OAO) strategy is adopted to construct binary STM models to model all possible paired classifications. Then, the decision function between each possible pair of classes and is obtained through dichotomy, and the fractional functionof the sum of the number of labels allocated to the category by the sample is calculated. The classes with the highest scores were considered predictive labels for unclassified samples.

3.3. The Steps of Health Status Assessment Model for High-Speed Train Running Gear System

As shown in Figure 3, according to the algorithm described above, the steps to summarize the health status assessment algorithm of the high-speed train running gear system are summarized as follows:Step 1: use SFA to extract the slow feature from the high-speed train running gear system monitoring data.Step 1.1: normalize the sample data.Step 1.2: whiten the data through equations (9) and (10).Step 1.3: take the derivative of the matrix .Step 1.4: equations (14) and (16) are used to obtain the mapping matrix.Step 1.5: slowness data is obtained by equation (17).Step 1.6: the slow feature is extracted according to equation (19).Step 2: the filtered slowness data matrix of the running gear system is constructed into the fourth-order tensor form, as shown in Figure 4.Step 3: the health status assessment model of the high-speed train running gear system based on STM was constructed.Step 4: the STM multiclassification model is transformed into multiple binary classification models.Step 4.1: the optimization problem of STM is divided into suboptimization problems.Step 4.2: the suboptimization problems were solved by alternating projection algorithm, and the decision function was obtained.Step 5: through equations (23) and (25), the system was assessment.

In the assessment model, the motor temperature, gearbox vibration, and impact of external ring friction of the retaining frame in the axle box were set as the features to assess the health of the running gear system. The specific assessment process will be described through cases in Section 4.

4. Case Study

To verify the accuracy of the SFA-STM assessment model proposed in this paper, this section will describe the high-speed train running gear system as an example for experimental verification. For data collection of the running gear system, the axle box, gearbox, and motor monitoring data of No. 2 carriage and position running gear system of the train during a certain month were selected. To ensure that the collected data was the data when the high-speed train was in operation, the monitoring data with speeds above 1000 r/min were screened to verify the training data and test data for the model, and a total of six monitoring indicators, including temperature, vibrations, and impact were selected for positions 1 and 2, respectively. Owing to the influence of the operating environment on the running gear system of high-speed trains during actual operation and considering the complexity of parameters, such as weight, center of gravity, and the suspension of each component, the health status assessment criteria for the running gear system in the high-speed train were set as “normal,” “general,” and “bad,” as shown in Table 1.

Normal: under this condition, all parts of the running gear system worked normally, fasteners were stable and not loose, and all indicators were within factory requirements. The average temperature was less than 30°C, the average vibration peak was less than 22 Hz, the average temperature and amplitude were at a very low levels, and the impact data were also at normal levels, thus ensuring the safe and smooth operation of the high-speed train.

General: under this condition, all parts of the running gear system still worked normally, but the parts were slightly worn and deformed, and all indicators were slightly higher than the factory requirements. The average temperature was in the range of 30–35°C, and the average vibration peak was in the range of 22–25 Hz. Under this condition, the operation of the high-speed train is still unaffected, but the appearance, working status, and overall performance of each part should be checked, and preventive and corrective maintenance work should be performed promptly.

Bad: under this condition, although all parts of the running gear system could still maintain normal operation, the status of some parts was not good, the operation function decreased, and the critical point of health and failure had been reached. The average temperature was higher than 35°C, and the average vibration peak was above 25 Hz. Under this condition, the system should be completely overhauled to replace or repair the parts with a higher degree of damage to improve the safety of the high-speed train.

4.1. Data Preprocessing

Because there are a large number of abnormal environmental noise points in the actual monitoring data of high-speed train running gear systems, it is necessary to preprocess the data. As shown in Figure 5, among them, the short-term fluctuations of temperature, vibration, and shock data are large, and there is much repeated data and abnormal data. Therefore, as shown in Figure 6, the temperature vibration and impact monitoring data of the moving parts at certain positions on the train I and II were preprocessed. The relevant outliers were filtered via mean filtering, and the data were finally compressed to 800 groups. The trend chart is shown in Figure 7.

The temperature sensor was in close contact with the motor element; thus, the external interference was relatively small, whereas the vibration and impact sensors showed more obvious uncertainty in disturbances and random noise owing to the influence of the train speed and acceleration. Although the monitoring data was preprocessed to filter out some of the abnormal value points, the data still included the noise and disturbances as adverse factors, and this could not simply solved by pretreatment. Thus, we needed the following SFA method for further data processing so that the data actually reflected the high-speed train health status.

4.2. Assessment Model

The monitoring data of temperature, vibration, shock, and a total of six characteristics at position I and II were normalized. The slowness data is obtained through the SFA, and the coefficient matrix of SFA optimization isthrough equation (19), the feature, , was smaller than the maximum of 0.1 times, which was selected to screen out four slow features that reflected the overall changing trend of high-speed trains.

After SFA, 800 sets of monitoring data of six features were transformed into 800 sets of slowness data of four features; 800 sets of data were constructed into the fourth-order tensor model of 400 sets of data. Among them, mode 1 of the tensor is the first slow feature and the second slow feature. The mode 3 of the tensor is the third slow feature and the fourth slow feature. Mode 2 is the slowness data of the adjacent two mileage. Mode 4 is 400 sets of the slowness data. The abovementioned information is shown in Figure 8.

The 400 groups of the fourth-order tensor data, which have been constructed, were divided into 150 training data and 250 test data; then, 150 groups of training data were divided into three groups according to the actual status to construct the training set of the STM model. Finally, the decision functions , , and between the status of each class were obtained through the STM dichotomy model. The corresponding minimization of energy function is , , and , respectively.

To verify the accuracy of the model, 250 groups of samples to be tested were input into the three decision functions obtained in the STM model, and the class with the highest score for each of them, namely, the prediction label, was calculated through equation (25), to complete the health status assessment of the running gear system.

As shown in Figure 9, the test samples can be accurately classified. Although few samples were misclassified, it can be concluded that the SFA-STM model can accurately assess the health status of the high-speed train running gear system.

4.3. Simulation

To more intuitively represent the reliability of the model presented in this paper, four classical data-driven methods (i.e., SVM, naive Bayes (NB), BP neural network, and hidden Markov model (HMM)) are used to compare and analyze this method. The data adopted by the three methods are the same as the data applied by the method proposed in this paper. All of them are 800 groups of monitoring data processed by mean filtering. However, because the training samples and test samples used in the method proposed in this paper are a total of 400 sets of data, to reflect the authenticity of comparative simulation, it is necessary to take the mean value of each adjacent mileage point in the 800 sets of monitoring data and reduce it to approximately 400 sets of training data. This process does not affect the accuracy of the evaluation model because, in engineering practice, two adjacent mileage points almost reflect the health status of the same running gear system. At the same time, the assessment grades of health status of the three models were also “normal,” “general,” and “bad,” with 150 sets of training data and 250 sets of test data.

SVM is used for comparative validation, as shown in Figure 10(a). In the beginning, the green line fits the blue line well, but in the middle, a large number of green points leave the blue line; in the end, the green line fits the blue line well. This indicates that the running gear system can be accurately assessed by the SVM model under normal and bad statuses, while the running gear system cannot be accurately assessed by the model in general status and at the beginning of bad status, which is a considerable limitation.

NB is used for comparative validation, as shown in Figure 10(b). From the fitting degree of the green line to the blue line in the figure, the NB model is still the same as the SVM model, which has many misjudgments in the middle status of the system and also has a certain degree of misjudgments in good and bad statuses. Compared with the SFA-STM model, the NB model still has difficultly accurately assessing the health status of the system.

The BP neural network is used for comparative validation, as shown in Figure 10(c). The fluctuation of the green line always follows that of the blue line. Although there is a certain degree of abnormal point fluctuation, the overall level of the BP neural network model is more accurate than that of the SVM model and the NB model in assessing the health status of the running gear system. However, the abovementioned figure clearly shows the red line of the SFA-STM model fits the blue line to a higher degree, which more accurately and intuitively reflects the health status of the system.

HMM is used for comparative validation, as shown in Figure 10(d). From the fitting degree of the green line to the blue line in the figure, the HMM model is still the same as the SVM model, which has a large number of misjudgments in the middle status of the system and also has a certain degree of misjudgments in good and bad statuses. Because both the NB and HMM models are based on Bayesian classification algorithms, it is not surprising to obtain the same classification results as NB. In general, the HMM model still has difficulty accurately accessing the health status of the running gear system compared with the SFA-STM model.

To directly compare the accuracy of the five methods, three indexes are used for comparison. First, the maximum sum of the squared resists (SSR) index is adopted, as shown in Table 2. According to equation (27), where represents the status value obtained from the assessment of sample according to the model, represents the actual status value of the sample, and represents the number of samples. The SSR index of the four methods is given, among which the SSR value of SFA-STM is the lowest, only 9; the SSR value of SVM is 18; the SSR values of NB and HMM are 25, and the SSR value of BP is 18.9135 [37]. Second, the true positive rate (TPR) and false positive rate (FPR) indexes were used to assess the accuracy of classification. Table 3 shows that the SFA-STM model still has the best effect [38]. In the end, as shown in Table 4, compared with the training time of each model, the training time is relatively long because both SFA-STM and SVM adopt the optimization algorithm. In general, it is seen that the SFA-STM model has the highest accuracy in assessing the health status of the high-speed train running gear system.

5. Conclusion

In this study, the SFA-STM model is proposed to assess the health status of the high-speed train running gear system, which solves the problem that there are many unknown interferences and random noise in the monitoring data under complex working conditions, which affect the assessment results. Using an example, it is shown that the SFA-STM system proposed in this study can accurately reflect the actual health status of the running gear system. Compared with the four types of traditional data-driven models, this method has higher applicability to practical engineering problems and provides a new solution to the problem that the general trend of system changes cannot be extracted by the traditional feature extraction method to assess the health status of the complex system under high noise and multiple disturbances.

Data Availability

The raw/processed data required to reproduce these findings cannot be shared at this time as the data also form part of an ongoing study.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant nos. 61903047, 61973046, and 61803044 and Jilin Province Development and Reform Commission under Grant no. 2019C040-3.