Abstract

Most of the existing fault detection methods rarely consider the cost-optimal maintenance policy. A novel multivariate Bayesian control approach is proposed, which enables the implementation of early fault detection for a helicopter gearbox with cost minimization maintenance policy under varying load. A continuous time hidden semi-Markov model (HSMM) is employed to describe the stochastic relationship between the unobservable states and observable observations of the gear system. Explicit expressions for the remaining useful life prediction are derived using HSMM. Considering the maintenance cost in fault detection, the multivariate Bayesian control scheme based on HSMM is developed; the objective is to minimize the long-run expected average cost per unit time. An effective computational algorithm in the semi-Markov decision process (SMDP) framework is designed to obtain the optimal control limit. A comparison with the multivariate Bayesian control chart based on hidden Markov model (HMM) and the traditional age-based replacement policy is given, which illustrates the effectiveness of the proposed approach.

1. Introduction

In a helicopter system, the mechanical drive system is the most efficient and compact device to transmit torque and change angular velocity. As the key mechanical transmission parts, gears are extensively used in aerospace, shipbuilding industry, and wind power industry, such as helicopters, cruisers, and wind turbines. Gear failure will cause machine halts in the whole mechanical system, resulting in great economic losses and even human casualties.

Condition monitoring and fault detection technique can significantly improve the reliability of the gear transmission system and reduce the occurrence of failure. In condition-based maintenance (CBM) and prognostics and health management (PHM), the vibration monitoring data obtained from the accelerometers carry a large amount of information, which were used in the fault detection of the gears for decades and were very helpful to prevent unnecessary machine halts [1, 2]. The accurate early fault detection can effectively prevent the occurrence of secondary damage. Appropriate preventive maintenance can significantly improve equipment availability and save cost for gearbox users. Therefore, the early gear fault detection and maintenance scheme optimization have become research hotspots in recent years.

In recent research, advanced nonparametric methods were widely applied to gearbox early fault detection, such as the continuous wavelet transform (CWT) [3], the multiscale chirplet path pursuit (MSCPP) [4], and multiresolution Fourier transform [5]. Some researchers applied parametric time series model to gearbox fault detection, most of whom assumed that gearbox ran under constant load conditions. For example, Wang and Wong [6] indicated that the autoregressive (AR) model with a high diagnostic confidence level can detect the gear crack much earlier than the conventional method. Rofe [7] used the autoregressive moving-average (ARMA) model under the condition of load fluctuation for gear fault diagnosis and selected the variance and kurtosis of the residuals of ARMA model as the indicators of failure. It should be noted that the fault diagnosis methods in the literature mostly assume the gears run under constant speed or constant load conditions, meaning that all the signal responses are due to component degradation or failures. This is not always consistent with the reality that gears often run under varying load, such as the wind turbine gearboxes and the helicopter planetary gearboxes [8, 9]. Under the assumption of constant load, it is difficult to separate the interference signals caused by load changes, which greatly affects the characteristics of the signals. In most cases, the change of the load variation in the feature level is similar to that caused by fault. This increases the difficulty of feature extraction and thereby also increases the difficulty of effective monitoring, diagnosis, and prediction. The gear motion residual (GMR) signal is an effective algorithm and highly insensitive to the varying load [10]. The healthy portion of the GMR signals is used to fit the VAR model and the residuals of the whole data sets are calculated to process the early fault detection scheme in this paper.

The operational states (healthy or unhealthy) cannot be observed directly when the gear system is running. Several approaches have been used to model the degradation process, such as proportional hazards model (PH) [11], stochastic filtering model (SFM) [12], and hidden Markov model (HMM) [13]. In recent years, HMM has been widely applied to speech recognition [14], system fault diagnosis [13, 15], and life prediction [16]. However, in HMM, the sojourn time in each operational state of the system is assumed to follow exponential distribution, that is, the “memoryless” property. As an extension of the HMM, the sojourn time in each operational state is not restricted by exponential distribution in a hidden semi-Markov model (HSMM), and its degradation process is much closer to the real degradation situations. Liu et al. [17, 18] proposed a diagnosis and prediction method of equipment based on the HSMM and the sojourn time follows normal distribution. Jiang et al. [19] carried out a small extension to HMM, considering a HSMM with the sojourn time obeys Erlang () distribution. A general Erlang HSMM is considered in this paper to model the deterioration process of the gear system. The expectation-maximization (EM) algorithm is employed to estimate the unknown state and observation parameters of the model. After the model parameters are estimated, the posterior probability that the system is in each state is used to derive the expressions for the reliability distribution function of remaining useful life (RUL), the probability density function (PDF), and the mean residual life (MRL) function.

When using the HSMM to model the deterioration of the gear system with the consideration of maintenance policy, it is necessary to consider the best stopping time. The Bayesian methods are used extensively to determine the stopping time. Girshick and Rubin [20] introduced the Bayesian method for the production process control for the first time in 1952, showing that repair was needed when the posterior probability was beyond the control limit. Considering the minimum maintenance cost, Kim et al. [13] put forward the optimal Bayesian forecasting model using the actual oil spectral analysis data. Most of the stochastic models assumed that the system degradation process was modeled by a continuous time HMM and each sojourn time of hidden states was exponentially distributed, which is not always realistic. Relaxing the assumption of Markovian process deterioration, Panagiotidou and Tagaras [21] proposed a model that integrated process control and maintenance for two operational states. However, this method is only applicable to univariate monitoring. Such a drawback of the existing optimization model motivates us to set up an optimal maintenance model under real circumstances. In order to find the optimal stopping time, based on the HSMM, a Bayesian control scheme is established with the objective of minimizing the long-run expected average cost per unit time to determine the optimal stopping time and when to perform the preventive maintenance actions.

To the best of our knowledge, this is the first paper applying HSMM with general Erlang distribution of sojourn time in two hidden states and optimal Bayesian control scheme to find the cost-effective strategy for early fault detection of the helicopter gearbox. With the condition monitoring information, the proposed approach can not only update the remaining useful life at each sampling epoch, but also process early fault detection of the helicopter gear system and determine the optimal stopping time with minimum cost simultaneously.

The remainder of the paper is organized as follows. In Section 2, the healthy part of GMR signal of the helicopter gearbox life test vibration data is used to fit a VAR model, and the residuals of the whole data sets are obtained. In Section 3, the HSMM is applied, in which each operational state obeys the general Erlang distribution, to describe the degradation process of the gear system, and the EM algorithm is employed to estimate the unknown state and observation parameters of the model. The conditional reliability distribution functions of the RUL and the MRL function are derived. In Section 4, on the basis of hidden semi-Markov degradation modeling, an optimal Bayesian control chart is developed, and the optimal control limit is solved in the semi-Markov decision process framework, followed by a comparison with other methods. The conclusions are presented in Section 5.

2. Residuals Computation for a Helicopter Gearbox

2.1. Experimental Scheme

The overall process of the optimal Bayesian control scheme for the helicopter gearbox early fault detection is illustrated schematically in Figure 1. Firstly, we need to obtain the residuals of the multivariate condition monitoring data. The test data were obtained from the Mechanical Diagnostic Test Bed (MDTB) in Pennsylvania State University, as shown in Figure 2.

Gearbox used in this test includes a pinion gear with 21 teeth and a drive gear with 70 teeth. The gearbox was driven by induction motors (22.38 kW, 1750 rpm) and the torques were provided by alternating current absorption motors (55.95 kW, 1750 rpm). The power of the gearbox is 3.73~14.92 kW and the speed ratio range is 1.2 : 1~6. #2 and #3 accelerometers were installed on the gearbox body, respectively, in the axial and horizontal directions, as shown in Figure 3.

Samples were taken every 8 minutes (0.1333 h) and stored in a new file with a sampling frequency of 20 kHz. Each sampling time is 10 s. The 16-bit resolution AD converter was used to ensure the accelerometers have sufficient precision. Driving motor speed V01 and gearbox output torque V05 were also collected with sampling frequency of 1 kHz and sampling time of 10 s. The test is divided into two stages: at a certain rotating speed, gearbox firstly was run at 100% rated output torque for 96 hours (555 in-lbs); it was then run under varying load until failure. In the second stage, the output torque was periodically increased from 50% to 300% rated torque (50%, 100%, 150%, 200%, 250%, and 300%) and then dropped from 300% to 50% rated torque (300%, 250%, 200%, 150%, 100%, and 50%) gradually. The output torque of the gearbox is shown in Figure 4. Figure 5 shows the mean value of the input shaft speed. The input shaft speed fluctuation error in Figure 5 is within 0.06%; therefore the speed can be considered as a constant.

The gearbox was run normally in the first stage for 96 hours; thus only the vibration data collected in the second stage (varying load) were analyzed. Sampling points collected by #2 and #3 sensors were, respectively, saved in files A02 and A03, and each of them contained 145 data files under varying load. The original vibration data is shown in Figure 6. The results of shutdown fault inspection were shown in Figure 7 that the pinion gear was normal while five fully broken and two partially broken teeth were found in the drive gear.

2.2. Gear Motion Residual (GMR) Signal

TSA algorithm, which extracts the meshing frequency components from the gear vibration data and then synchronously adds and averages them with the rotation of the gear shaft, is widely used in fault diagnosis of gears. The number of sampling points in a single rotational period can be calculated using the following formula:where is sampling frequency, is the fundamental meshing frequency of the gear, is the number of teeth of the gear, and is the ceil function.

Suppose that , data points were contained in each sampling data file, the rotation period number of the gear is given bywhere represents the floor function.

In a complete revolution of the gear of interest, the TSA signal is given by

The TSA signal of the helicopter gearbox is shown in Figure 8.

It is efficacious to use the TSA method to reduce the influence of the background noise and the nongear vibration source. GMR signal, which can be obtained by removing the gear meshing frequency and its harmonic from the TSA signals, is highly insensitive to the varying load conditions [10]. The GMR signal can be expressed as follows:where is the signal composed of the eliminated components.

The GMR signal of the gear of interest is shown in Figure 9. It can be seen from the comparison between Figure 9 and the TSA signal that the GMR signal effectively removes the noise, representing more obvious signal change features. Further, we consider a VAR model for the healthy part of the GMR signal to obtain the residual.

2.3. GMR Residuals Computation Using VAR Model

In this section, the healthy portion of the GMR signal is used to fit a VAR model. Two-dimensional data can be expressed as , . Assume that the healthy portion of the data obeys a stationary VAR process:where is the mean value; is the model order; is the autocorrelation matrices; are independent identically distributed (i.i.d) and obey ; is the covariance. Let , so (5) can be expressed with the following form:where

Reinsel [22] indicated that and can be obtained using least squares estimates, and the AIC criterion is selected in this paper to determine the model order . Thus the estimated VAR model parameters can be utilized to calculate the residual of the TSA signals:where .

For ,

For , the residual can be computed recursively by Kalman filter, for which the details can be found in the Appendix.

Using the above method, we obtain the GMR residual signal of the gear, which is shown in Figure 10. To compress the huge amount of data, we select the standard deviation, which is commonly used characteristic parameter in the vibration signal processing, to process the residual of the gear, as shown in Figure 11.

Figure 11 illustrates that the residual standard deviation values of A02 and A03 are relatively stable from files 1 to 100; therefore, we assume the system is in the healthy state in this portion, while from files 101 to 145 the system operates in the unhealthy state as the standard deviation values of this part show an obvious increase compared with those of the healthy portion. Normality and independence tests are taken for these two parts of data, the results of which are shown in Table 1.

From Table 1, both the standard deviation values for A02 and A03 passed the normality and independence tests, indicating that standard deviation values of the residual are independent and obey the multivariate normal distribution, which is consistent with the assumption that the observation of the HSMM described in Section 3 obeys multivariate normal distribution.

3. Residual Life Prediction Based on Hidden Semi-Markov Model

3.1. Hidden Semi-Markov Model

It has been demonstrated from in [13, 19, 23] that a 3-state Markov model is adequate to model the deterioration process of the system. We assume that the degradation process of the system is described by a 3-state HSMM, the state space of which is . States 0 and 1 are unobservable and represent the good and warning system states, respectively. Only state 2 is assumed to be observable and represents the failure state. Suppose the system starts in the healthy state, that is, . While the sampling interval is fixed, the observation of the system is independent given the state of the gear system. is used to denote the -dimension observation vector at each sampling epoch; thus given the state obeys -dimension multivariate normal distribution with , , and the probability density function is given bywhere , are unknown observation parameters.

The state transition of the gear system is illustrated in Figure 12. The transition probability matrix is , where represents the probability that the system transits to state after it leaves from state .

Generally, the transition probability matrix of the system is assumed as follows:Eq. (11) indicates that the system starts in the healthy state, where .

Suppose that in state for , the sojourn time obeys the general Erlang distribution, and the probability density function is given bywhere is the unknown shape parameter and is unknown rate parameter. Erlang distribution can also be modeled as a series of a given number of exponential phases running one after another until the end of the sojourn time. In order to record the phases in the Markov model, we need to enlarge the state space. Let be the new state space, where represents the set of healthy states, represents the set of unhealthy states, and represents the failure state.

The system’s failure time is denoted by . Let denote the data histories and and denote the sets of state and observation parameters to be estimated, respectively. Since the sample path () of the state process in HSMM is unobservable, it is difficult to determine the analytical expression of the maximum likelihood function. The expectation-maximization (EM) algorithm can solve it by iteratively maximizing the pseudo likelihood function. Let and be the initial values; the EM algorithm is shown in Figure 13.

Updated parameters in each step are then used as new initial values to the E-step, leading the iteration in E-step and M-step until the Euclidean norm satisfies , for small given . Detailed expressions of likelihood function and pseudo likelihood function can be found in Khaleghei and Makis [23]. The state parameters can be updated by calculating the unique stationary point from

The explicit expressions of updated in each step are given by

The updated in each step can be obtained by

For observation parameters, the unique stationary point can be obtained by

Thus the explicit expressions of the updated in each step are given bywhere the form , , , , , , and .

The initial values to and were assigned and EM algorithm was employed to solve the parameters to be estimated, the convergence criterion for which is . We assigned different initial values and the results were very similar, which are shown in Tables 2 and 3.

3.2. Remaining Useful Life (RUL) Prediction

Let be the posterior probability that given the observation data the system is in state at sampling time :where , denotes the posterior probability vector. Suppose the system always starts in the first phase of the healthy state, that is, . According to the Bayes’ theorem, the posterior probability can be updated iteratively at each sampling epoch by the following formula:where .

It can be obtained from (10) that for ; thuswhere .

For the gearbox degeneration system, the estimated shape parameters of Erlang distribution in the hidden states are (see Table 2); therefore the transition probability matrix can be obtained by solving Kolmogorov backward differential equations:

Suppose at sampling time , the gear system has not failed, that is, ; then for and , the conditional reliability function of RUL of the gear system is

Thus the PDF is:

Given the model parameters and , (22) and (23) can be used to update the conditional reliability function and the PDF using the condition monitoring data obtained from each sampling epoch. Taking file 88 to file 93 for analysis, the corresponding conditional reliability functions are shown in Figure 14 and the probability density functions in Figure 15, where “” represents the actual residual values corresponding to each sampling file. As can be seen from Figure 15, from file 89, the RUL distribution of the gear system is highly concentrated; the main cause is that with the steady accumulation of condition monitoring data, the uncertainty of the remaining distribution lifetime gradually decreases while the accuracy of the RUL prediction continually improves.

The MRL of the gear system is given by:

The data from file 1 to file 95 are selected for MRL prediction. After each sampling completed, the life prediction is performed when the new observations are available, and then the predicted value is compared with the values that are obtained using HMM. Meanwhile, as illustrated in Table 4, the relative error (RE) between actual remaining useful lives and the prediction results using different models are calculated. As can be seen in Table 4, with the increase of collected data, the RE in the prediction results using HSMM are smaller than those using HMM, which is closer to the actual values.

4. Optimal Bayesian Control Scheme

4.1. Optimal Bayesian Control Chart Based on HSMM

In this section, we design a multivariate Bayesian control chart to detect the early fault of the helicopter gearbox. The optimal control limit is used to determine the stopping time, of which the objective is to minimize the long-run expected average cost per unit time. In partially observable Markov decision process (POMDP), it is well known that the posterior probability that the system is in the unhealthy state provides enough information for maintenance decision making [13]. Let be the posterior probability that the helicopter gear system is in the unhealthy state:where the initial value .

By renewal theory, the cost minimization problem with fixed sampling interval is equivalent to seeking an optimal value of , such thatwhere and CL denote, respectively, the cycle cost and cycle length.

The Bayesian control scheme is illustrated in Figure 16. For the fixed control limit , at each decision epoch, the posterior probability vector is updated by Bayes’ theorem. At the sampling time , if , the system will continue to run, of which the cost for each sampling is . If , system should be stopped and full inspection is performed to determine whether the system is in state 0 or not, for which the inspection cost rate and inspection time are, respectively, and . The gear system has probability to be in state 1 and probability to be in state 0. If the system is found to be in state 0 after a full inspection, then no maintenance measures are taken and the system continues operating in the healthy condition. If the system is found to be in state 1, preventive maintenance measures will be taken with the maintenance cost rate and the maintenance time . If the posterior probability is not beyond the fixed control limit but still fails, that is, a random failure occurs, we need to immediately take failure replacement measures with corresponding maintenance cost rate and replacement time and , respectively. When the system takes full inspection, preventive maintenance, or failure replacement, the lost production cost rate will be incurred. When the system operates in the warning state, additional operating cost rate and maintenance cost rate will be generated. We further assume that after the inspection, the preventive maintenance, or failure replacement, the system will begin from a new cycle.

We design an effective algorithm in the semi-Markov decision process (SMDP) framework to obtain the optimal control limit. In order to discretize the interval , it is sufficient to choose the number of subintervals to provide effective accuracy. For , we define the SMDP in state ; that is, if the posterior probability in state lies in the interval , the posterior probability in state lies in the interval , and the posterior probability in state lies in the interval . If , the preventive maintenance is performed, then the SMDP is defined in state . If the system fails, the SMDP is defined in state . Let , then the state space for the SMDP is defined as , where presents the gear system is in state one at the initial moment.

For the fixed control limit , can be obtained by solving the following linear equations:where is the relative value until the next decision epoch given the current state . is the expected cost incurred until the next decision epoch given the current state . is the expected sojourn time until the next decision epoch given the current state . is the probability that at the next decision epoch the system will be in state given the current state is .

Next, we derive the closed form expressions of each quantity in (27). For each , , and , for , the transition probabilities of SMDP are calculated by the following formula:

From (10) we can further obtain:where , , , .

Thus (18) can be expressed as follows:where .

In (29), for , is computed as follows:

For , the probability can be derived using the similar method in (32) and is omitted. So the first item in (29) can be written as follows:

Since , , the probability in (33) can be calculated using Theorem  3.1 proposed by Provost and Rudiuk [24]. Next, the other transition probabilities in SMDP are derived.

If , after full inspection, the probability of system in state 1, that is, true alarm occurs, is given by

Also, after full inspection, the gear system may be found in healthy state; the probability is given by

If , the probability of failure for the gear system is given by

The remainder transition probabilities are given by

If , the corresponding expected cost and expected sojourn time are given by

If , the expected cost and expected sojourn time can be computed as follows:

If the gear system is in or state, the corresponding expected costs and expected sojourn times are given by

By setting the maintenance time parameters  h,  h, and  h and maintenance cost parameters , , , , , , and , we coded the computational algorithm in MATLAB R2015b to calculate the different expected average costs under different control limits . The optimal control limit was and the corresponding average cost was equal to 113.59. The Bayesian control chart of the helicopter gear system is shown in Figure 17. It is observed that based on HSMM the posterior probability firstly exceeds the optimal control limit at file 89, which indicates that the system need to be stopped to take full inspection at this moment.

4.2. Comparison with Other Maintenance Policies

Using the same maintenance time and cost parameters, the optimal control limit was obtained in the Bayesian control chart based on HMM, in which the sojourn time in each hidden states follows exponential distribution. The corresponding average cost was equal to 121.72. It should be noted that the average cost concluded by HSMM was less than that by HMM. As shown in Figure 17, while based on HMM, the posterior probability firstly exceeds the optimal control limit at file 91. The multivariate Bayesian control chart indicates that the proposed Bayesian control scheme can detect the fault of the gear of interest much earlier than the method based on HMM.

Next, we considered the traditional age-based replacement policy, which is well known that it does not take into account condition monitoring information. For the helicopter gearbox, the expected cycle cost () under age-based replacement policy is the sum of failure cost, additional operating and maintenance costs while the system operates in unhealthy state and the preventive maintenance cost:where is the stopping time.

The expected cycle length () is the sum of the expected time under normal operation, replacement time caused by failure, and preventive maintenance time:

Then the long-run expected average cost per unit time under age-based replacement policy is equivalent to finding the optimal stopping time , such that

Table 5 shows the optimal values obtained under different policies, which are, respectively, the multivariate Bayesian control chart based on HSMM and HMM and the age-based replacement policy. It is interesting to find that the expected average cost using the Bayesian control chart based on the HSMM is considerably lower than the other policies, while the cost obtained by the age-based replacement policy is the highest due to the irrespective of condition monitoring information. This is reasonable because, for the helicopter gear system, the cost incurred by failure is always higher than the cost incurred by inspection and preventive maintenance measures.

5. Conclusions

In this paper, we have proposed a novel optimal Bayesian control scheme for the early fault detection of the partially observable helicopter gear system. The GMR signal was selected for data preprocessing to eliminate the influence of load variation from the original signal of the helicopter gearbox. The healthy portion of GMR signal was used to fit a VAR model; thus the residuals of whole data history were obtained. Without the restriction that the sojourn time in each hidden state was exponentially distributed, the general Erlang distribution was considered for modeling the gear system’s sojourn time in each of the hidden states in a 3-state HSMM. The EM algorithm was employed to estimate the unknown parameters of the model. Using HSMM to describe the deterioration process of the gear system, several important quantities for the gearbox residual life prediction, such as conditional reliability distribution function, PDF and MRL were derived in terms of the posterior probability that the system was in each hidden state. The comparison with the life prediction method based on HMM indicated that the relative errors were smaller using HSMM. An optimal multivariate Bayesian control chart with cost-effective criterion was developed. Moreover, the optimal control limit was solved in the SMDP framework. The comparison with the Bayesian control chart based on HMM as well as the age-based replacement policy indicated that the proposed multivariate Bayesian control scheme provided accurate early fault detection to the helicopter gear system with the minimum cost.

In this research, we have considered the multivariate Bayesian control chart under fixed sampling interval; further improvement can be obtained by considering two or more sampling intervals or the availability maximization criterion to obtain the optimal control limit, which can be a suitable topic for the future work.

Appendix

Formula (6) can be rewritten as a state space model; the corresponding state and observation equations are given bywhereand are i.i.d. .

For each , we define

Then, the following recursive equations of Kalman filter are given byand the initial values are given by

For each , using the recursive equations above we obtain

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors are grateful to Professor Viliam Makis in University of Toronto for discussions. This paper was cosupported by the National Natural Science Foundation of China (nos. 60939003 and 61079013), the Funding of Jiangsu Innovation Program for Graduate Education (no. KYLX15_0313), and the Fundamental Research Funds for the Central Universities (no. NS2015072). Also, the support provided by China Scholarship Council (no. 201606830028) during a visit of Xin Li at the University of Toronto is acknowledged and appreciated.