Abstract

Realistic prognostic tools are essential for effective condition-based maintenance systems. In this paper, a Duration-Dependent Hidden Semi-Markov Model (DD-HSMM) is proposed, which overcomes the shortcomings of traditional Hidden Markov Models (HMM), including the Hidden Semi-Markov Model (HSMM): (1) it allows explicit modeling of state transition probabilities between the states; (2) it relaxes observations’ independence assumption by accommodating a connection between consecutive observations; and (3) it does not follow the unrealistic Markov chain’s memoryless assumption and therefore it provides a more powerful modeling and analysis capability for real world problems. To facilitate the computation of the proposed DD-HSMM methodology, new forward-backward algorithm is developed. The demonstration and evaluation of the proposed methodology is carried out through a case study. The experimental results show that the DD-HSMM methodology is effective for equipment health monitoring and management.

1. Introduction

Fault is a change from the normal operating condition of a system to an abnormal condition, which occurs as a result of system performance degradation over time [1]. Diagnostics indicates the occurrence of a fault and its root cause. Prognostics is fault prediction method; it involves detection of a pending fault before it occurs, identifying its root cause and estimating the remaining useful life (RUL), which is also known as time-to-failure [2]. Condition-based Maintenance (CBM) is a maintenance program that recommends maintenance actions based on the information collected through condition monitoring.

A CBM program can be used to do diagnostics or prognostics; however, regardless of the application, it follows three steps [35]. First, data relevant to events and system health are collected through data acquisition techniques. Data acquisition in CBM includes event-type data (i.e., information of what happened) and condition monitoring data, which are the measurements related to system health. Second, event and condition monitoring data are interpreted for better understanding in the data processing step. Finally, maintenance decisions are made based on the interpretation and analysis of data. In particular, to identify the weakest components and states and improve the efficiency of CBM, the integrated importance measure of multistate system was introduced by Si et al. [6, 7]. An extensive survey on machine diagnostics and prognostics implementing condition-based monitoring can be found in Jardine et al. [8] and Heng et al. [9].

Data analysis for event data only is reliability analysis, which maps the event data over a time axis to determine the probability of events and uses the probability distribution to predict failures. On the other hand, data acquisition in CBM provides event and condition monitoring data. Therefore, it is more effective to combine events and conditions in a model in order to do diagnostics or prognostics. Hidden Markov model (HMM) is a technique for modeling and analyzing event and condition monitoring data together. It consists of two stochastic processes: a Markov chain with finite number of states describing an underlying mechanism and an observation process depending on the hidden state [1012]. An HMM contains finite states connected by transitions. Each state is characterized by a transition probability and an observation probability [13].

Researchers have proposed a number of techniques to address these limitations. Continuous variable duration HMM is adopted in the speech recognition. Compared to standard HMM, results show that the absence of a correct duration model increases the error rate by 50% [1416]. Another example is in handwritten word recognition area; due to the inherent ambiguity related to the segmentation process in handwritten words, it is a practical idea to use the variable duration model for the states in a HMM-based handwritten word recognition system [17, 18]. Recently, some researchers apply HMM in the area of diagnostics and prognostics in machining process [19, 20]. However, these studies use only ordinary HMM technique. The inherent limitation of HMM as mentioned above still exists in these models.

Prognostic methods used in CBM are often a combination of statistical inference and machine learning methods [4, 2124]. Model-based methods assume that measured information is stochastically correlated with the actual machine condition. HMM identifies the actual machine conditions from observable monitored data through a statistical approach. HMM has been very effective in various applications ranging from speech recognition [10, 14, 2527] to tool wear monitoring and machining [3, 20, 28, 29].

The primary advantage of HMM is its robust mathematical foundation that can allow for many practical applications and different areas of use. An added benefit of employing HMMs is the ease of model interpretation in comparison with pure “black-box” modeling methods such as artificial neural networks that are often employed in advanced diagnostic models [28]. However, an inherent limitation of HMM approach is that its state duration follows an exponential distribution. In other words, HMM does not provide adequate representation of temporal structure.

To overcome the limitations of HMM in prognosis, Dong and He [30, 31] propose a Hidden Semi-Markov Model-based (HSMM) methodology by adding an explicit temporary structure into HMM to predict RUL of equipment. In this model, the states of HSMM are used to represent the health status of equipment. The trained HSMM can be used to diagnose the health state of equipment. Through parameter estimation of the health-state duration probability distribution and the proposed backward recursive equations, the RUL of the equipment can be predicted [32]. Although the results from HSMM are promising, the deterioration in the same state of the system is not taken into consideration in this model. It assumed that the state transition probabilities stay the same in the same state, which assumes all observations are independent, which typically does not hold in real world applications.

This paper presents a new approach that expands the HSMM methodology [30, 31, 33] with duration-dependent state transition probabilities. Different from HSMM, the proposed Duration-Dependent Hidden Semi-Markov Model (DD-HSMM) does not follow the unrealistic Markov chain’s memoryless assumption and therefore provides a more realistic and powerful modeling and analysis capability. The major contribution of the DD-HSMM methodology is that it allows explicit modeling of the transition probabilities, which do not only depend on the state but also vary with the duration of each state, and it provides the capability to relax observations’ independence assumption by accommodating a link between consecutive observations, which makes it more realistic in real world applications.

2. Theoretical Background

2.1. Description of General HSMM

A Hidden Semi-Markov Model (HSMM) is an extension of HMM by allowing the underlying process to be a semi-Markov chain with a variable duration or sojourn time for each state. The HSMM model is an ideal mathematical model for estimating the unobservable health states with observable sensor signal. For example, a small change in a bearing alignment could cause a small nick in the bearing, which could cause scratches in the bearing race and additional nicks, leading to complete bearing failure. This process can be well described by the HSMM. Let be the hidden state at time and the observation sequence; a HSMM is characterized by its parameters. The parameters of a HSMM are as follows: the initial state distribution (denoted by ), the transition model (denoted by ), the observation matrix (denoted by ), and the state duration distribution (denoted by ). Thus, a HSMM can be written as . For a given state , is the probability matrix of observation being at time and at time .

2.2. State Transitions

In HSMM, there are states, and the transitions between the states are according to the transition matrix ; that is, . Similar to standard HMM, the state at time is a special state “START.”

Although the distinct health-state transition is Markov, the state transition is usually not Markov. It is the reason why the model is called “semi-Markov” [32], which means in the HSMM case, the conditional independence between the past and the future is only ensured when the process moves from one health state to another distinct health state.

2.3. Inference Procedures

Similar to HMM, HSMM also has basic problems to deal with, that is, evaluation, recognition, and training problems:(1)evaluation (also called classification): given the observation sequence and a HSMM , what is the probability of the observation sequence given the model, that is, ;(2)decoding (also called recognition): given the observation sequence and a HSMM , what sequence of hidden states most probably generates the given sequence of observations;(3)learning (also called training): how do we adjust the model parameters to maximize .

Different algorithms have been developed for above three problems. The most straightforward way of solving the evaluation problem is enumerating every possible state sequence of length (the number of observations). However, the computation burden for this exhaustive enumeration is prohibitively high. Fortunately, there is a more efficient algorithm that is based on dynamic programming, called forward-backward procedure. The goal for decoding problem is to find the optimal state sequence associated with the given observation sequence. The most widely used optimality criterion is to find the single best state sequence (path), that is, to maximize that is equivalent to maximizing . Viterbi algorithm is used to find this single best state sequence, which is based on dynamic programming methods. For learning problem, there is no known way to obtain analytical solution. However, we can adjust the model parameter such that is locally maximized using an iterative procedure, such as the Baum-Welch method (or equivalently the Expectation-Maximization algorithm).

3. Inference and Learning Mechanisms of DD-HSMM

3.1. Model Structure

Although HSMM has explicit state duration probability distribution , the sate transition probabilities are duration invariant. In this paper, we replace duration-invariant state transition probabilities with duration-dependent state transition probabilities. The parameters for a DD-HSMM are as follows: the initial state distribution denoted by ; the transition model denoted by ; the observation matrix denoted by   , where  is the observation number in , and is the observation symbols in state ); and the state duration distribution denoted by . Thus, a DD-HSMM can be written as . Here, for the duration in given state is , the state transition probability ( is the state number, is the max staying time in state ). And the state transition probabilities satisfy the constraint:

3.2. Duration-Dependent State Transition Probability

In DD-HSMM, the state transition probability distribution . We define duration-dependents state transition probabilities as follows: where and are the number of states and the maximum duration in any states, respectively. Equation (2) represents the transition from state to state , given that the duration in state at time is . It indicated that, in the DD-HSMM case, the state transition probability is not only state dependent, but also duration variant.

3.3. Inference Procedures

Similar to HSMM, DD-HSMM also has basic problems to deal with, that is, evaluation, recognition, and training problems. To facilitate the computation in the proposed DD-HSMM-based health prediction model, in the following, new forward-backward variables are defined and modified forward-backward algorithm is developed.

A dynamic programming scheme is employed for the efficient computation of the inference procedures. To implement the inference procedures, a forward variable is defined as the probability of generating and ending in state and the duration : The initial conditions are established at time as follows: All unspecified values are zero. For time , where is the state transition probability from state to state , given that the duration in state at time is . is the output probability of observation vector from state and is the state duration probability of state . is the number of states in DD-HSMM and is the maximum duration in state .

Similar to the forward variable, the backward variable can be written as

For the backward probability, the initial conditions are set at time as follows:

For time ,

Then the total probability can be computed by

3.4. Modified Forward-Backward Algorithm for DD-HSMM

In order to give reestimation formulas for all variable of the DD-HSMM, one DD-HSMM-featured forward-backward variable is defined: In this equation, is the probability of state transition from state to state at time after being in state for a duration of , given the model and observation . From the definition of the forward-backward variables, we can derive as follows: Then, we have and the probability in state at time with duration of is defined as , and, from the definition of the forward-backward variables, we can easily derive as follows:

The forward-Backward algorithm computes the following probabilities.

Forward Pass. The forward pass of the algorithm computes .

Step 1 (initialization ()). The forward variable is shown as follows:

Step 2 (forward recursion ()). For ,

Backward Pass. The backward pass computes

Step 1 (initialization ()). The backward variable is shown as follows:

Step 2 (backward recursion ()). For ,

3.5. Parameter Reestimation for DD-HSMM

The reestimation formula for initial state distribution is the probability that statewas the first state, given : The reestimation formula of state transition probabilities is the ratio of expected number of transition from state to state , to the expected number of transitions from state :

3.6. Training of State Duration Models Using Parametric Probability Distributions

In this paper, state duration densities are modeled by single Gaussian distribution estimated from training data. The existing state duration estimation method is through the simultaneous training DD-HSMM and their duration densities. However, these techniques are inefficient because the training process requires huge storage and computational load. Therefore, a new approach is adopted for training state duration models. In this approach, state duration probabilities are estimated on the lattice (or trellis) of observations and states, which is obtained in the DD-HSMM training stage.

The mean and variance of duration probability of state are determined by In these equations, is the probability of state at time with the duration of and can present as

4. DD-HSMM Based Health Prognostic

Many applications in the actuarial, econometric, engineering, and medical literature involve the use of the hazard rate function [33]. The mathematical properties of HR function can reveal a variety of features in the data.

Let denote the time to failure of an item under consideration, with lifetime distribution function and reliability function , where and . Assume that and density function exist; then the HR function can be defined as: In which, is the total number of sample items, is the number of items that fail before time , and is the number of items that fail during the time interval . The ERL function is the expected time remaining to failure, given that the system has survived to time ; then , for such that . Therefore, can be approximated as the conditional probability of failure during the time interval given survival to time .

Suppose that a machine will go through health states before entering failure state . Let denote the expected duration of the machine staying at health state ; based on the parameters estimated above, we can get as follows: And can be denoted by Then, once the machine has entered the health state , its expected residual life equals the summation of the expected residual duration of the machine staying at health state and the total remaining staying in the future health states before failure. Denote as the expected residual duration of the machine staying in the health state for . When the equipment entered state at time , the conditional probability of failure during can be defined as the probability that the machine will transit to any other state during the coming and the probability that the machine still stay at state . It can be seen from (9) and (10) that can be denoted as follows: Then The DD-HSMM equipment health prediction procedure is given as follows.

Step  1. From the DD-HSMM training procedure (i.e., parameter estimation), the state transition probability for the DD-HSMM can be obtained.

Step   2. Through the DD-HSMM parameter estimation, the duration probability density function for each health-state can be obtained. Therefore, the duration mean and variance can be calculated.

Step   3. By classification, identify the current health status of the equipment.

Step   4. The remaining useful life (RUL) of equipment can be predicted by the following formula (suppose that the equipment currently stays at health state with duration of ):

5. Case Study

In this case study, long-term wear experiments on rolling element bearings were conducted [1]. In order to collect adequate amount of data sets for the validation of the proposed scheme, three experiments with normal operating conditions, three experiments with cage defect fault, and three experiments each of inner and outer race defect faults were performed until the bearing reached a complete failure state and stopped operating. Bearing characteristic frequencies in the frequency domain are extracted from the vibration signals corresponding to different degrees of the health states of the bearing acquired during experiments.

During the test running, under each condition, vibration signals were collected. These signals were extracted using a Mahalanobis-Taguchi System (MTS) based model in the original paper [1] and used for the proposed DD-HSMM methodology in this paper. The expert judgment is made of four integer numbers ranging from 0 to 3, representing 4 system states, as follows:0→the bearing is operating normally;1→the bearing is operating and shows signs of deterioration; it is advisable to take some preventive action at the next planned maintenance;2→the bearing is operating but requires immediate attention;3→the bearing has failed.

5.1. Operation State Identification

In order to identify the accuracy of the operation state identification method proposed in this paper, experimental data with normal operating condition were obtained. The experimental data set included 50 samples for each state (denoted by 0, 1, 2, and 3). Of these data points, 20 of them were used to train the model, and the remaining 30 samples were used to validate the model.

In the DD-HSMM, mixture Gaussian distribution and the single Gaussian distribution were used to model the output probability distribution and the state duration densities separately, in which the number of states is 4. The maximum number of iterations in training process is set to 100 and the convergence error to 0.000001.

The DD-HSMM-based training model is shown as Figure 1. The x-axis shows the training steps and the y-axis represents the likelihood probability of different states. As can be seen from Figure 1, the progression of the four states reaches the set error in less than 40 steps. This demonstrates the potential of the model to have a strong real-time signal processing capability.

The classification results obtained on the remaining 30 data samples are shown in Table 1. As indicated in the results, the accuracy of the DD-HSMM method is 94.2%.

5.2. Health Prediction for RUL

As described before, a four-state DD-HSMM prediction model is constructed. In the training process, even if the device is in the same running condition, the dwell time is different, transition probabilities between states and the mean or variance of duration in each state are not the same. Tables 2 and 3 show the state transition probability, the mean, and variance of duration in each state when , representing the bearing in state 1 with duration of 1. Tables 4 and 5 show the state transition probability, the mean, and variance of duration in each state when , representing the bearing in state 1 with duration of 4.

First, the state of the current operating state based on the recognition results is determined; then the residence time is calculated according to the duration parameters of the operating state in training process. Then, the remaining effective life in the current operational state is calculated using (25). Finally, the RUL of the bearing can be calculated using (26). Suppose that the bearing is now at state 1 with a duration of 1; then the following can be obtained: , by (25), and by (26).

5.3. Prediction Comparison

In order to compare the prognostic method based on the DD-HSMM with the prognostic method based on the HSMM, (29) is used to evaluate the life error. In (29), RULactual represents the actual life of the component, and RULforecasted represents the expected life predicted by DD-HSMM or HSMM:

Table 6 shows the prediction comparison of DD-HSMM versus HSMM. Failure prediction of the HSMM method is only state dependent, while the DD-HSMM method uses both state dependency and duration dependency. The DD-HSMM method has a self-updating capability, in which the historical data on states are used in the calculation of state transition probability matrix. As indicated in the results, the DD-HSMM method is more accurate than the HSMM method.

6. Conclusion

This paper presents a Duration-Dependent Hidden Semi-Markov Model (DD-HSMM) for prognostics. As opposed to the Hidden Semi-Markov Model (HSMM), failure prediction capability of the DD-HSMM method uses state dependency and duration dependency. The two important aspects of equipment health monitoring, which are the stages and the rate of aging, are taken into consideration in an integrated manner in the proposed DD-HSMM model. The duration-dependent state transition probability in the Hidden Semi-Markov model makes the decision-making more relevant to real world applications.

In order to facilitate the computational procedure, a new forward-backward algorithm and reestimation approach are developed. By using autoregression, the interdependency between observations is established in the model. By incorporating an explicitly defined temporal structure into the model, the DD-HSMM is capable of predicting the remaining useful life of equipment more accurately.

The demonstration of the proposed model is carried out using experimental data on rolling element bearings. The proposed model provides a powerful state recognition capability and very accurate results in terms of remaining useful life prediction. In order to draw general conclusion on the capabilities of the proposed DD-HSMM, more experimental data in various prognostics areas are needed.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors gratefully acknowledge the financial support for this research from the National High Technology Research and Development Program of China (no. 2012AA040914), the National Natural Science Foundation of China (Grant no. 71101116), and the Basic Research Foundation of NPU (Grant no. JC20120228).