Abstract
This paper presents a novel fault diagnosis method based on data fusion for a reaction flywheel of the satellite attitude system. Different from most traditional fault diagnosis techniques, the proposed solution simultaneously accomplishes fault detection and identification within parallel fusion blocks. The core of this method is independent fusion block, which uses a generalized ordered weighted average (GOWA) operator to complement the characteristics of output data from long shortterm memory (LSTM) neural network and discrete wavelet transform (DWT) so as to enhance the reliability and rapidity of decisionmaking. Moreover, minibatch normalization is selected to address the problem of covariate shift, realize the adaptive processing of the dynamic information in the original data, and improve the convergence speed of the network. With the highfidelity model of the reaction flywheel, three common faults are, respectively, injected to collect experimental data. Extensive experiment results show the efficacy of the proposed method and the excellent performance achieved by LSTM and DWT.
1. Introduction
In recent years, the aerospace industry has begun to be interested in preventive maintenance system. A technological shift is ongoing in system monitoring from traditional fixed interval maintenance (FIM) to conditionbased maintenance (CBM) system [1, 2]. Compared with the former, CBM eliminates unnecessary preventive scheduled maintenance, reduces maintenance cost, and improves safety and reliability of the system. CBM includes a variety of technologies, such as performance monitoring and fault diagnosis, which ultimately enables the system to detect the faulty components and take appropriate measures [3].
Fault diagnosis has been widely concerned in CBM whose main task is to detect the fault occurrence time and identify the type of fault. Then, the diagnosis information is used to upgrade the maintenance operations from FIM to CBM [4, 5]. Nowadays, the intelligent technology represented by neural network further improves CBM, and thus many intelligent fault diagnosis methods have been studied by experts and scholars. Generally, the neural networkbased diagnosis methods can be divided into two categories. One is to design an observer and estimator or fit the system function by the nonlinear fitting ability. Xin et al. [6] used the single hidden layer feedforward wavelet neural network to design an adaptive observer to estimate the state value of the system, whose residual is the detection signal of fault. Inspired by the concept of fault parameters, some researchers proposed a single fault detection method based on neural network fault parameter estimator [7, 8]. On this basis, SobhaniTehrani et al. [9] designed a new fault diagnosis structure, which solved the problem of detection and isolation of multiple faults in the system by paralleling multiple parameter estimators. Witczak et al. [10] proposed a method of observer based on the statespace recurrent neural network (RNN). The approach guaranties a predefined disturbance attenuation level and convergence of the observer, as well as unknown input decoupling and state and actuator fault estimation. The other category adopts the feature extraction ability of neural network to detect and classify faults. The diagnosis procedure generally includes three steps, i.e., data collection, artificial feature extraction, and healthstate recognition [11, 12]. A novel health monitoring system for a variable air volume unit is developed in [13]. After generating fault features for various fault types via fuzzy logic, the artificial neural network classification technique is applied to fault classification. Sobie et al. [14] used training data gained from highresolution simulations of roller bearing dynamics to train machine learning algorithms. Although simulated data cannot replace experimental data, they provide a stronger starting point for novel applications to further improve fault classifier performance. Barakat et al. [15] introduced the growing neural network to construct a diagnosis model for motor bearings, which obtained the higher diagnosis accuracy for a large number of data when compared with the conventional RBFN and the probabilistic neural network. A new approach named hybrid gradient boosting is proposed in [16] for fault detection in robotic arms. The method is based on the logistic regression model, and random forests, neural networks with machine learning algorithm, and xgboost models are used to boost the base line model.
At present, there is an endtoend trend in the field of deep learning, which combines feature extraction and classifier design into a neural network. The scheme overcomes the shortcoming of traditional intelligent diagnosis method that needs manual intervention. The motivation of this idea is that neural network can automatically learn both features of original data and classifier so as to better adapt to fault diagnosis and improve performance [17–19]. In order to deal with the dynamic information of time series, recurrent neural network, such as long shortterm memory (LSTM) neural network, and its variants have shown excellent performance, especially in the complex time series problems including signal processing, fault classification, and price prediction [20, 21]. By adjusting the inputs and outputs of the LSTM cell with the nonlinear gate units, LSTM can learn the dynamic information of the time series adaptively. However, it should be noted that the accuracy of this kind of method depends on the data’s types and prior data of the system.
With the development of sensor technology, the types of system data collected are more abundant, which promote the development of data fusion technology and have been widely applied in different fields [22–24]. It can use the complementarity between data sets to improve the accuracy of decision process. In addition, the cooperative use of overlapping data reduces the uncertainty in the system and leads to more reliable results [25, 26]. Ordered weighted average (OWA) and its variants, as a robust tool to aggregate data from various sources, have become a representative data fusion method [27]. Now, the technology is also gradually applied in the field of fault diagnosis and prognosis to improve the ability of feature extraction and fault decision [28]. Rezamand et al. [29] proposed a novel realtime failure prognosis method for wind turbine bearings in which the OWA operator is applied to combine information obtained from various single features to provide relatively accurate predictions. A new type of fuzzy Petri nets (FPNs) was introduced in [30], which overcomes the deficiencies of traditional FPN by using intuitionistic fuzzy sets (IFSs) and intuitionistic fuzzy ordered weighted average (IFOWA) operators. This makes the model include a wide range of particular cases so that it can effectively handle the various uncertainties in knowledge acquisition and representation. SánchezFernández et al. [31] achieved fault identification based on a scored ranking at two time points: early fault and steady fault. In each case, the OWA linguistic operator based on the regular increasing monotone (RIM) function can find the variables that are responsible for the fault. However, the specific application value of OWA still needs further study.
Reaction flywheel (RW) as the actuator of satellite attitude control system (ACS) is an important guarantee for the successful implementation of space mission. The RW consists of several highly nonlinear internal and external circuits. Generally, only feedback control signal and attitude sensor can be used for monitoring. Therefore, the fault diagnosis and health monitoring of RW is very challenging.
In this paper, the parallel fusion blocks are used to detect and identify the three common faults of RW: motor current reduction, bus voltage insufficient, and work temperature overhigh. In each block, a data fusion method based on LSTM and DWT is designed. The main motivation to integrate LSTM and DWT methods is to improve the performance of fault diagnosis.
The main contributions of this paper are as follows:(1)The parallel structure of fusion blocks reduces the calculation complexity, which allows the network to achieve higher accuracy and speed in identifying the initial fault.(2)The LSTM can adaptively learn the dynamic information of sequential data. The minibatch normalization method is applied to solve the problem of covariate shift and improve the convergence of LSTM.(3)The output information of LSTM and DWT is complemented by the GOWA operator, which overcomes the shortage of data types and improves the realtime detection ability and identification accuracy of fault diagnosis system.
2. Minibatch NormalizationBased Vanilla LSTM
Vanilla LSTM, improved form of LSTM, is most commonly used in sequential data prediction and classification [32]. The internal structure of the vanilla LSTM is illustrated in Figure 1.
In Figure 1, and represent the prior cell state and output, while and represent the current cell state and output. , , , and are the cell input, input gate, forget gate, and output gate, respectively. denote the sigmoid function and tanh function. It can be seen from the figure that vanilla LSTM has similar architecture to RNN, but the former is based on a set of connected cells. Different from the RNN cell which overwrites the information directly, each cell of the vanilla LSTM contains cell input, three gates, and cell output. The cell output is recurrently connected back to the cell input and all the gates. This special design allows the vanilla LSTM to robustly remove or add information during long period of time.
2.1. Minibatch Normalization
In deep neural network, the change of parameters in one layer usually has a serious impact on the distribution of subsequent layers. We refer to this phenomenon as covariate shift which can be addressed by normalizing layer input [33].
Since the full whitening of each layer’s input is costly and not differentiable everywhere, we made two necessary simplifications in this paper. The first is that we normalize each scalar feature independently instead of whitening the features in layer inputs and outputs. For a layer with ndimensional input , each dimension is normalized aswhere the expectation and variance are calculated over the training data set. Note that simply normalizing each input to a layer may change what the layer can represent. To solve this problem, we introduce a set of parameters for each activation , which determine the mean and standard deviation of the normalized features:
When using stochastic optimization, normalizing activations by entire training set is impractical. Thus, the second simplification is that each minibatch produces estimates of the mean and variance of each activation. In this way, the statistical information used for normalization can fully participate in the gradient backpropagation.
Consider a minibatch of size :
Let the normalized values be and their linear transformations be , we refer to the transform
The minibatch normalization transform is presented in Algorithm 1.

In Algorithm 1, is a regularization parameter added to the minibatch variance for numerical stability.
2.2. Forward Pass
Let be the input at time , and are the number of inputs and LSTM cells, respectively. According to the architecture of the vanilla LSTM, the equations for layer forward pass can be written aswhere represent the input weights, recurrent weights, peephole weights, and bias weights, respectively. represent the sigmoid function and tanh function; denotes the elementwise multiplication, and the initial states are the parameters of the network.
Apply minibatch normalization to vanilla LSTM. In order to avoid unnecessary redundancy and over fitting, set in minibatch normalization. According to Algorithm 1 and (5), the forward pass of minibatch normalizationbased vanilla LSTM (hereinafter referred to as LSTM) is as follows:where
Because training data is standardized before training, there is no need to normalize .
2.3. Backpropagation through Time
The deltas inside the LSTM cell are calculated aswhere denotes the delta passed down from the upper layer. Only when there is a layer to be trained below, the input deltas are needed, which can be calculated as follows:
Then, the gradients for weights are computed aswhere denotes any of and denotes the outer product of two vectors.
3. The Proposed Method for Fault Diagnosis
In this section, a data fusion fault diagnosis method for the actuator of satellite attitude system is proposed. The main purpose is to design a fault diagnosis system to detect and identify the fault in RW. The timely fault diagnosis can not only give the satellite enough time to take appropriate measures before the further development of the fault in the system but also use the fault information to predict the remaining service life of the system. The following content defines the fault set of the actuator of satellite attitude system and introduces the detailed design process.
3.1. Fault Set of the Reaction Flywheel
The RW considered in this paper is ITHACO “type A” reaction flywheel that is manufactured by Goodrich Corporation. A high fidelity nonlinear model of the RW can be obtained from [34] and has been integrated into the ACS dynamics.
Three identical RWs are used in a threeaxis stabilized satellite (regardless of the often redundant fourth RW), and each RW has a dedicated fault diagnosis block. Since the three RWs are identical, the results of the fault diagnosis block for only the pitch axis is studied. The training data set can be obtained from the closedloop ACS simulation of the threeaxis stabilized satellite. The moment of inertia of three axes is . The initial angle of the satellite is 5, and the running time is 100 seconds.
In this paper, three types of common faults in RW are considered, including motor current reduction, bus voltage insufficient, and work temperature overhigh. Generally, current and voltage faults are transient, while temperature fault accumulates slowly. Thus, fault in the motor current reduction is modeled and injected as variations in motor torque gain . Fault in the bus voltage insufficient is modeled and injected as drops in the voltage of the power bus . Fault in the work temperature overhigh is modeled and injected as the slope function in the standard temperature . In other words, , , and are used to replace , and , where , and are defined the fault parameters, collected in Table 1. Different fault parameters are injected into RW, respectively, to obtain the satellite pitch angle and feedback control signal. These measurements are very precise, and they are less affected by noise. Therefore, we add only 1% white noise to these measurements to mimic more realistic conditions.
3.2. The Data Fusion Fault Diagnosis Method
As Figure 2 illustrates, a parallel fault diagnosis framework is designed to detect and identify the fault.
It is indicated from Figure 2 that three parallel fusion blocks are, respectively, responsible for the diagnosis of motor current reduction, bus voltage insufficient, and work temperature overhigh. In each block, the fusion method based on LSTM and DWT is used for fault detection and identification.
3.2.1. The Neural Network
Neural network is an effective method to estimate the complex nonlinear function, which is widely used in the field of fault diagnosis. It constructs a mapping between input and output of the system by the available data set [35]. In this paper, the satellite pitch angle and feedback control signal are selected as inputs of neural network.
With the proposed LSTM, the diagnosis of fault data is straightforward. The offline modeling and online monitoring flowcharts are shown in Figure 3. The procedures of offline modeling and online monitoring are as follows: Offline modeling:(1)Collect data of different fault types as training data.(2)Normalize each feature of the training data.(3)Train the three parallel LSTM with Adam.(4)Calculate the loss function . If and the number of iterations , go to (3), where is a small positive number and is the maximum number of iterations.(5)Output the parameters of the neural network. Online monitoring:(1)Sample a new raw data as testing data:
3.2.2. The DWT Method
Wavelet transform can explore the local characteristics of signals and analyze signals with different time resolution and frequency resolution [36]. DWT is used to discretize the scale and translation of wavelet transform, and it can be used for adaptive timefrequency analysis of nonstationary signals. Moreover, DWT has the ability to capture frequency and location information of the signal. Therefore, DWT is an excellent tool for fault diagnosis.
Consider a signal , which can be constructed by the linear combinations of scaling functions and orthogonal wavelets as follows:where represent the scaling functions and orthogonal wavelets, respectively. And represent the dilation and translation factors, respectively. Approximation coefficients and detail coefficients can be computed by
As shown in Figures 4 and 5, the value of the detail coefficient will jump when the fault occurs in the system, and amplitudes are different for various faults. Therefore, the fault type can be identified by analyzing these coefficients.
3.2.3. The Fusion Decision Based on GOWA
The GOWA operator can aggregate data information more effectively and sensitively by adding an additional parameter to OWA, so it is considered as a tool for multiinformation decision [37–39]. Due to the preprocessing of attitude data by LSTM and DWT, compared with other OWA diagnostic methods, the fusion method based on GOWA does not require to design a complex operator such as using intuitionistic fuzzy sets, linguistic operator, or induced continuous OWA. And only a simple decision process can achieve accurate diagnosis.
Figure 2 shows the internal diagram of the fault diagnosis system based on the data fusion method. Three parallel fusion blocks are used to identify different fault types. In each block, the GOWA operator is used to integrate the decision outputs of LSTM and DWT into a unique framework. The fusion outputs of every block are as follows:where , and are the outputs of the GOWA operators for motor current reduction, bus voltage insufficient, and work temperature overhigh, respectively. is the output of the DWT. denote the weight factors of the LSTM and DWT for different fault types. Generalized parameter is defined as 2.
4. Simulation Analysis
In this section, the effectiveness of the proposed fault diagnosis method for RW of the satellite attitude system is verified by simulation experiments.
4.1. Fault Scenarios
This paper considers three common faults of motor current reduction, bus voltage insufficient, and work temperature overhigh in the ACS. Table 1 presents the health value range of these faults, so we can obtain fault data by injecting different faults into the simulation model.
4.1.1. Motor Current Fault
The motor has a torque constant , which delivers a torque proportional to the current driver, i.e., . Therefore, can be used to reflect the changes in motor current. When drops outside a certain range, the RW cannot provide sufficient control torque, which leads to fault of ACS.
4.1.2. Bus Voltage Fault
The bus voltage needs to be set high enough to avoid insufficient voltage margin. When the bus voltage is insufficient, the EMF of the motor will rise so that the maximum torque that the motor can provide decreases. Eventually, it affects the stability of the satellite attitude.
4.1.3. Temperature Fault
Viscous friction is generated due to the bearing lubricant, and it has a strong sensitivity to the temperature :
Note that it is the main friction in RW. Therefore, when the temperature exceeds the threshold, the generated friction will reduce the control torque and result in failure.
4.2. Experiment Setup
4.2.1. Implementation of LSTM
As shown in Figure 2, the pitch attitude and the feedback control signal of the satellite are regarded as the inputs of the network with time steps 5. The LSTM contains 12 memory cells, and the output of memory is read through the full connection layer with activation function tanh to generate network output.
In training, the mini batch size and epochs are set to 30 and 50. The result in Figure 6 illustrates that the BNbased LSTM converges significantly faster to the baseline LSTM.
Under operating condition, any output value is regarded as health, and any other value is regarded as fault. The value is considered as an unknown fault, and the output of the network is set to −2, indicating that an unknown fault is detected in the system.
In order to illustrate the online work procedure of the blocks, it is assumed that a motor current fault occurred. In this case, , while . Furthermore, 100 tests are carried out to evaluate the accuracy of the LSTM for different faults. Table 2 illustrates the results, where label denote motor current reduction, bus voltage insufficient, work temperature overhigh, health, and unknown types, respectively. It is indicated from Table 2 that the error rate between voltage fault and temperature fault is higher than others. So, we can solve this problem by combining the outputs of LSTM and DWT.
4.2.2. Implementation of DWT
As mentioned above, the rapidly changing detail coefficients in the event of a fault can be used to detect the fault. For example, consider a motor current fault with the parameter occurs in the system at . Wavelet and level are selected to be db4 and 4, and Figures 4 and 5 show the behavior of the approximation and detail coefficients of the fault for satellite attitude and control signal, respectively.
Temperature fault will change very slowly in the process of temperature accumulation, and it is usually difficult to be observed at the beginning of fault. For motor current reduction and bus voltage insufficient, the sudden change of fault parameters will lead to higher detail coefficient. This feature is considered as the difference between them and work temperature overhigh. So, the DWT identification mechanism is straightforward. The fault identification logic with DWT is as follows:
For pitch angle,
For control signal,
If (15) and (16) are both satisfied, DWT outputs “1,” indicating that the system identifies the fault as motor current reduction or bus voltage insufficient. Otherwise, the output is “0”. However, DWT cannot detect temperature fault. Thus, this paper combined the DWT method with LSTM as an aid fault diagnosis tool to improve the accuracy of fault diagnosis system.
4.2.3. Weights Determination of GOWA Operator
The function of GOWA operator is to fuse the outputs of LSTM and DWT to make fault decision. These weight factors can be determined by optimizing the following cost functions:where is a vector of weighting factors, is the output of block at iteration, denotes the ideal output of iteration, and “0” represents the health and “1” represents the fault.
Since DWT is unable to identify temperature fault, LSTM is only used to identify temperature fault. Table 3 presents the weighting factors of the fusion method. The design parameters of LSTM and DWT are summarized in Table 4.
4.3. Performance Evaluation
In the experiment, the fault classification performance of DPCA + SVM, DLDA + SVM, DWT + MLP, and DWT + LSTM is compared. In DPCA and DLDA, 5 samples are concatenated to form the extended vectors. For SVM, we choose RBF as kernel with the parameter , where is the number of the extract features of DPCA or DLDA. The DWT + MLP is similar to the proposed fusion method, but LSTM is replaced with MLP. The results of the four fault identification methods are shown in Figure 7. In the confusion matrix, the rows represent the actual label and columns represent the predict label. The diagonal cells show where the actual and predict labels match. The nondiagonal cells show instances of test algorithm errors. It is noted that the unknown fault has no test data, and the default accuracy rate is 1.
(a)
(b)
(c)
(d)
It can be seen from Figure 7 that the fusion method can correctly separate 97% of the motor current reduction, 96% of the bus voltage insufficient, 94% of the work temperature overhigh, and 96% of the health condition. Compared with traditional “feature extraction + classifier” mode such as DPCA + SVM and DLDA + SVM, the proposed method has higher diagnostic accuracy for different fault types. In addition, with the adaptive processing ability of dynamic information in raw data, it also has a better performance than the fusion method of DWT with conventional single hidden layer feedforward network such as MLP which takes each data independently for training and ignores the correlation information between different data. At the same time, compared with the result of LSTM in Table 2, the proposed fusion method has a significantly lower error rate of fault diagnosis due to fusing the diagnostic information of DWT through GOWA. In summary, the discriminant power in the fusion method is larger than that in DPCA + SVM, DLDA + SVM, and DWT + MLP. Thus, the proposed fusion method should be more suitable and effective for fault identification.
Furthermore, in order to evaluate the realtime performance of fault diagnosis system, various faults consisting of the motor current reduction, bus voltage insufficient, and work temperature overhigh are performed on the RW. Then, the test results obtained by the proposed fault diagnosis system using the control feedback signal and the pitch angle are discussed.
4.3.1. Motor Current Reduction
Consider a motor current fault with at . Figure 8 illustrates the change of satellite pitch attitude and feedback control signal in case of motor current reduction. Although the fault parameter is a small value, the fault has a sever effect in the system. Figure 9 presents the fault diagnosis signals in the LSTM, DWT, and GOWA for motor current faults.
(a)
(b)
Due to the severity of the fault, all fault diagnosis blocks can detect the fault quickly. However, compared with the LSTM method, DWT and GOWA have faster response speed.
4.3.2. Bus Voltage Insufficient
Consider a bus voltage fault with at . Figure 10 indicates the satellite pitch attitude output and feedback control signal. Figure 11 presents the LSTM, DWT, and GOWA output signals under bus voltage fault. Similarly, the detection speed of GOWA is faster than that of the LSTM method.
(a)
(b)
4.3.3. Work Temperature OverHigh
Consider a temperature fault with a slope of 0.4 in the system as follows:
Figure 12 illustrates the change of satellite pitch attitude and feedback control signal in case of temperature fault. Figure 13 illustrates the fault signals in the LSTM, DWT, and GOWA as the operating temperature continues to rise.
(a)
(b)
It is worth noting that due to the slow change of system state caused by fault, DWT alone cannot detect the occurrence of temperature fault. Therefore, only the output of LSTM plays a decisive role in fault detection in this condition.
5. Conclusions
Due to the inherent nonlinearities of RW and satellite attitude dynamics, as well as the impact of disturbance on the satellite, it is a challenging problem to effectively and accurately diagnose the RW. In this paper, a fusion method based on LSTM and DWT is proposed to solve the problem of fault detection and identification of the actuator. Three common faults of RW, motor current reduction, bus voltage insufficient, and work temperature overhigh, are researched. Then, three parallel fusion blocks are developed to detect and identify these faults.
In each block, fault information from LSTM and DWT is fused by the GOWA operator, so fault types can be synthetically determined. In addition, due to the use of LSTM, the dynamic information of process data can be used adaptively to improve the reliability of the system. Compared with DPCA + SVM, LDA + SVM, and DWT + MLP algorithms, the fusion method has better performance in fault identification accuracy. Moreover, compared with the single LSTM method, this method has better realtime fault detection ability, more sensitive, and faster response to the fault.
Finally, there will be many directions to be explored about fault diagnosis and prognosis in future. Some of those are as follows:(1)In the process of fault detection, transfer learning can be used to learn the identified unknown fault to enhance network intelligence(2)After the fault diagnosis, the prediction of remaining useful life of the system can be further studied by using the time series prediction ability of LSTM
Appendix
A block diagram representation of this RW model is shown in Figure 14.
The RW is mainly composed of three nonlinear parts: EMF torque limiting loop, bearing friction and disturbance, and speed limiter. Let the motor current and angular velocity be the states of the flywheel, the voltage signal is output from the controller as the input, and the torque generated by the RW as the output. The state equation of the RW can be written aswhere and are the functions due to motor disturbances, is derived from the EMF torque limiting loop, is the sigmoidal function replacing the discontinuous sign function in the Coulomb friction, and is the speed limiter. Their expressions are as follows:where the typical constants are defined in Table 5.
Data Availability
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this article.
Acknowledgments
This research was partially supported by the Foundation of Graduate Innovation Center in Nanjing University of Aeronautics and Astronautics under Grant no. kfjj20181507.