Prediction Model Design for Vibration Severity of Rotating Machine Based on Sequence-to-Sequence Neural Network
Steam turbine rotor system is a main part of the power production process. Accurate prediction of the turbine rotor operation state leads to timely detection of the hidden danger and accordingly ensures the efficient power production. The vibration severity reflects the vibration intensity and the working condition as well. Since the accuracy of the normal prediction method is not enough, a new model is proposed in this paper that combines the sequence prediction model with the gated recurrent unit (GRU). According to the obtained results, the accuracy is improved through the proposed model. To verify the effectiveness of the model, simulations are performed on the steam turbine rotor unbalance fault data. The experimental results demonstrate that the proposed approach could be utilized for vibration severity prediction as well as state warning of the steam turbine.
Working condition prediction provides various benefits for rotating machine maintenance. It is known as the main way for early indication of the hidden danger and providing the overhaul reference. The accurate prediction improves the safety level of the rotating machine [1–3].
At present, the statistical prediction methods and artificial intelligence-based methods like artificial neural networks have been commonly employed for the trend analysis. In a spectrum analysis-based approach, a gray prediction model has been employed to forecast the vibration severity with a small amount of the steam turbine vibration data in a data reorganization form . The ensemble empirical mode decomposition (EEMD) method has been adopted for the vibration data analysis and obtaining the intrinsic mode function (IMF). Although the prediction model has been employed to predict the IMF value for obtaining the dynamic vibration data , this model is not suitable for timing prediction. In another work, the information fusion has been combined with the backpropagation neural network (BP neural network) to construct a prediction model. In the mentioned neural network, the data measured by the sensor were considered as its input and its output was applied to the decision fusion model to obtain the final prediction results . Although the BP neural network is always utilized to build a predictive model, it could not work with the time-series data. The support vector machine (SVM) method has been employed to construct this prediction model for every component processed by the EMD. Moreover, different corn functions with different components have been adopted to acquire the final prediction results . In , a bidirectional LSTM (Bi-directional Long-Short Term Memory, BI-LSTM) unit has been employed by a deep learning method for data prediction. However, due to its complex structure with too many parameters, its convergence speed is slow.
In this paper, a gated recurrent unit- (GRU-) based sequence prediction model is proposed for prediction and analysis of the vibration severity. Although accurate classification could be obtained through traditional deep learning methods like BP and SVM, their data prediction capability for time series is not acceptable. Since the sequence prediction model with GRU has the memory cell, it can solve the gradient disappearance problem. The obtained results reflect the development trend of the vibration severity and meet certain precision requirements.
2. Materials and Methods
The prediction system structure is shown in Figure 1. The details of the prediction model and vibration severity are illustrated in the following.
2.1. The GRU-Seq2Seq Model Structure
The GRU-Sequence-to-Sequence (GRU-Seq2Seq) model is a variation of the recurrent neural networks (RNN) . The main difference between this model and the RNN is that input and output sequences in the RNN have similar lengths while the Seq2Seq model allows different lengths for these sequences. In this paper, the base frequency amplitude for the past month is adopted by the model to predict the next week data. This model is called the encoder-decoder model, and its structure is shown in Figure 2.
In Figure 2, and are the input and output sequences, respectively. is a signal that is employed to start the decoder part.
This model contains the following four different layers:(1)The input layer: The historical data denote the model input where is the input signal at time .(2)The encoder layer: The dynamic equations of the encoder layer are given as The GRU of this layer encodes the input to the hidden state : where and are the weights and the biases of the upgrade, reset, and memory cell gate, respectively; is the hidden layer output at the last time; is the input at time t; is the memory cell at time t; ; is the activation function; and u is the number of hidden units. is the hidden state of the encoder layer at time and is considered as the decoder layer input.(3)The decoder layer: where is the encoder layer output at the last time; is the hidden state of the decoder layer at time . To realize the recurrent forecasting, the decoder layer input is considered as the decoder layer output at the last time, or , at the time . The decoder layer starts to decode when it receives the signal .(4)The output layer:where is the hidden state of the decoder layer at time ; is the output at time ; and are the weight and bias of the output layer, respectively. The final output sequence of the model is denoted by .
Operator definition: denotes the direct multiplication of the matrix; represents the multiplication of the corresponding elements.
2.2. The Basic Unit of GRU-Seq2Seq
The GRU as a variation of the RNN unit is the basic unit employed in this paper. The gradient disappearance happens when the normal RNN unit processes the time-series data  while the GRU avoids this problem by adding the gate. It catches the relationship between different time-series data .
The GRU structure is shown in Figure 3.
This structure contains three different gates including the upgrade gate, the reset gate, and the memory gate. The upgrade gate output denoted by determines how much information could be transmitted from the last state to the next state . The reset gate output denoted by determines the importance of the last state to the next state . In other words, if this value is equal to zero, the information will not be conveyed to the next memory cell . The memory gate output denoted by is a combination of the input at the current time and the state at the last time. The new vector includes the information of the last sequence and the input at this time.
The operation of the mentioned three gates is illustrated as follows:(1)The upgrade gate function is described as(2)The reset gate function is described as(3)The memory gate function could be described as
The input at time t could be converted to the hidden state where is the input at the next time:
In equations (5)–(8), are the corresponding outputs of these three gates; are the weights and biases of these three gates, respectively.
2.3. Model Training
The loss function could be defined aswhere and are the predicted and real outputs, respectively; and are the weight and the bias of the output, respectively; is the regularization parameter; is the learning rate; and is the length of the output vector.
The details of the gradient descent method as the training algorithm are given as follows:(1)Calculation of the weight gradient: The weight gradient of the upgrade gate is obtained as The weight gradient of the memory gate could be calculated as The weight gradient of the reset gate is computed as In relations (10)–(12), is the state gradient at the last time; is the input state at the last time; are the corresponding gradients of these three gates; is the learning rate; and is the regularization parameter.(2)Updating the parameters: The upgrade gate update is described as The memory gate could be updated as The reset gate is updated as(3)The training process is shown in Figure 4. The training loop repeats until the parameters converge to their corresponding expected values.
2.4. The Rolling Prediction
In actual operating conditions, the prediction results are affected by different factors. Thus, the prediction data should be updated to ensure the accuracy of the results. The rolling prediction method is illustrated in Figure 5.
In Figure 5, the prediction model input at this time is denoted by where is the input at time ; the prediction output is denoted by where is the output at time . After the model acquires the real output , the input and output of the model at this time are described by and , respectively.
2.5. The Vibration Severity Analysis
The vibration severity is extracted from the unbalance fault model. Then, the prediction accuracy of the model is evaluated. The GRU-Seq2Seq model flowchart employed in the steam turbine vibration severity analysis is shown in Figure 6.
2.6. Extracting Data from the Mathematical Model
Since the steam turbine rotor system operates in the high-speed working state for a long time, the incidence of the unbalance fault reaches 70%. The mathematical model  of this fault is described aswhere is the mass eccentricity; is the equivalent mass of the rotor and disk; are the linear and nonlinear stiffness coefficients of the rotating shaft, respectively; denote the damping coefficients of the rotor at bearing and disk, respectively; and are the horizontal and vertical displacements, respectively.
The mass eccentricity variation can simulate the unbalance fault level. The relationship between the mass eccentricity and time is described as :
The mass eccentricity variation is related to the actual working condition. To make the prediction result closer to the actual effect, the following time-varying mass eccentricity (denoted by ) is proposed:
The corresponding units of the unbalance fault and eccentricity are meters and seconds, respectively. The formulas (17) and (18) are brought into the fault model (16). Moreover, the Runge–Kutta method as a conventional method for solving differential equations is employed to obtain the horizontal displacement vibration data of the turbine rotor. By considering the sampling time as 1 sec and the sampling interval as 6 hours, the continuous sampling for one year is performed to obtain a matrix containing 1440 sets of the vibration displacement data and storage.
2.7. The Displacement Vibration Data Processing
According to the previous section, the displacement vibration data could be obtained from the mathematical model of the unbalance fault given in (16). To extract the useful data from the total data and employ it as the input of the GRU-Seq2Seq prediction model, more processing is required.
The spectrum plot is extracted from the time domain displacement vibration data by using the fast Fourier transformation. The vibration data and the spectrum plot are shown in Figures 7 and 8, respectively.
Vibration severity contains all the frequency information and reflects the vibration intensity. Since the vibration severity is a representation of the energy, it is known as an important parameter for estimating the working condition of the turbine rotor system. By predicting it, the engineer could detect the rotor system faults and prevent the hidden danger.
For each signal , its power is defined as
The vibration severity function could be written as
Comparing relations (19) and (20), gives
Consider that is a periodic signal satisfying the Dirichlet conditions. Now, it could be described with the following Fourier series expansion:
According to relations (19) and (22), we have
Thus, the vibration function could be written aswhere is the vibration displacement signal.where is the vibration speed signal. are the frequency boundaries; is the amplitude of the frequency ; and is the number of the sample points.
Equation (16) could be employed for finding the vibration severity. The obtained vibration severities for and are shown in Figures 9 and 10, respectively.
The vibration severity is considered as the prediction model input and is denoted by matrix . In this paper, the last month (120 data) data are employed to predict the next week data (28 data). To validate the accuracy of the results, 80% and 20% of the data are considered as the training and test sets, respectively.
The vibration severity for and are shown in Tables 1 and 2. More details about data of and could be found in the Data Availability section.
3. Results and Discussion
The root mean square error (RMSE) and mean absolute percentage error (MAPE) are chosen as the criteria to evaluate the accuracy of the prediction model.
By constructing the GRU-Seq2Seq rolling prediction model with two steps, the results shown in Figures (11)–(14) are obtained. First and second rolling prediction results for are shown in Figures 11 and 12, respectively.
The obtained criteria for the first step prediction are given as and . Moreover, the obtained maximum deviation is .
The calculated criteria for the second step prediction are given as and . The calculated value for the maximum deviation is .
First and second rolling prediction results for are presented in Figures 13 and 14, respectively.
In this case, the following criteria are obtained for the first step prediction: and . In addition, the maximum derivation is obtained as .
The corresponding criteria for the second step prediction are and . Moreover, the maximum derivation in this case is .
According to these results, the output accuracy is good enough to satisfy the standard of the vibration severity monitoring. To avoid overfitting, EarlyStopping is utilized. This means that the performance indicators are continuously monitored in each iteration to stop training if the training accuracy is satisfactory. The mean square error (MSE) is employed as the performance indicator in the EarlyStopping procedure. By choosing , the convergence time and the obtained results are compared with BP and LSTM-Seq2Seq models (as shown in Table 3).
It is obvious that the rolling prediction results for these three models are stable while higher accuracy could be obtained with the GRU-Seq2seq model. Moreover, this model gives a considerable improvement in the prediction accuracy compared with the BP and LSTM-Seq2Seq models. It is obvious that the GRU-Seq2seq model provides higher prediction accuracy and is more suitable for data prediction of the precision machinery such as steam turbines.
According to the ISO 2372 standard, the vibration severity values that are higher than 28.0(mm/s) are not acceptable for steam turbine rotating machines. The rolling prediction results for show that the time point could be acquired when the vibration severity exceeds the ISO standard (Figure 15).
As could be seen from Figure 15, the rolling prediction task stops as the vibration severity exceeds the ISO standard (28.0(mm/s)). This has occurred after 165 days. By using these prediction results, the working condition of the rotating machine could be estimated and its maintenance could be performed in time.
The results prove that the GRU-Seq2Seq prediction model is suitable for data prediction of the vibration severity. Considering the defects of the existing trend analysis algorithms for long-term time-series prediction, a sequence prediction model combined with the GRU is proposed in this paper. This model provides the vibration severity characteristics of the rotating turbine machine. In this paper, the sliding time window is employed to continue the rolling prediction procedure. Accordingly, the possible risks for the rotating machine could be indicated. This facilitates the equipment maintenance. As a future work, the prediction results could be combined with the maintenance to ensure the safe operation of the rotating machines.
The readers can calculate the vibration severities for different values of e. To attain this goal, the required code to calculate the vibration severity from the mathematical model and the GRU-Seq2Seq neural network is available through the following link and password: https://pan.baidu.com/s/1Z-ltiZdezzj85cuaQHJepQ; password: y0uK. The readers can access the data after registration. The registration link is https://passport.baidu.com/v2/?reg&tt=1555071708648&overseas=1&gid=B4F5FD0-6269-450C-B929-CD00BF68770D&tpl=netdisk&u=https://pan.baidu.com/disk/home.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this article.
The authors acknowledge the financial support from the Shanghai Science and Technology Commission Local Capacity Building Project.
W. C. Laws and A. Muszynska, “Periodic and continuous vibration monitoring for preventive/predictive maintenance of rotating machinery,” Journal of Engineering for Gas Turbines and Power, vol. 109, no. 2, pp. 159–167, 1987.View at: Publisher Site | Google Scholar
K. Madani, “A survey of artificial networks base fault detection and fault diagnosis techniques,” in Proceedings of the International Joint Conference on Neural Networks, pp. 3442–3446, Washington, DC, USA, July 1999.View at: Google Scholar
R. Isermann, “Process fault detection based on modelling and estimation methods: a survey,” Automatica, vol. 20, no. 4, pp. 387–404, 1984.View at: Publisher Site | Google Scholar
Y. Feng, Research on Diagnosis and Early Warning System of Important Rotating Equipment Status and Performance in Nuclear Power Plant, Shanghai University of Electric Power, Shanghai, China, 2018.
G. E. P. Box and G. M. Jenkins, Time Series Analysis: Forecasting and Control, Holden Day, San Francisco, CA, USA, 1976.
J. van Amerongen, H. R. van Nauta Lemke, and J. C. T. van der Veen, “Autopilo Tforships designed with Fuzzy sets,” in Proceedings of the 5th IFAC/IFI PConference on Digital Computer Applications to Process Control, pp. 479–487, The Hague, Netherlands, June 1977.View at: Google Scholar
W. T. James, E. Patrick, M. Sharry, and R. Buizza, “Wind power density forecasting using ensemble predictions and times series model,” IEEE Transactions on Energy Conversion, vol. 24, no. 3, pp. 775–782, 2009.View at: Publisher Site | Google Scholar
S. Wang, X. Wang, S. Wang, and D. Wang, “Bi-directional long short-term memory method based on attention mechanism and rolling update for short-term load forecasting,” International Journal of Electrical Power & Energy Systems, vol. 109, pp. 470–479, 2019.View at: Publisher Site | Google Scholar
K. Cho, B. van Merrienboer, C. Gulcehre et al., “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” 2014, http://arxiv.org/abs/1406.1078.View at: Google Scholar
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.View at: Publisher Site | Google Scholar
J. Chung, C. Gulcehre, K. H. Cho et al., “Empirical evaluation of gated recurrent neural networks on sequence modeling,” 2014, http://arxiv.org/abs/1412.3555.View at: Google Scholar
L. Zhao, Investigation on Failure Mechanism and Shift Orbits Recognition Methods for Rotating Machine, Dalian University of Technology, Dalian, China, 2010.