Mathematical Problems in Engineering

Volume 2015 (2015), Article ID 278120, 23 pages

http://dx.doi.org/10.1155/2015/278120

## Hidden Semi-Markov Models for Predictive Maintenance

^{1}Electronics and Informatics Department (ETRO), Vrije Universiteit Brussel (VUB), Plainlaan 2, 1050 Brussels, Belgium^{2}Interuniversity Microelectronics Center (IMEC), Kapeldreef 75, 3001 Leuven, Belgium

Received 9 October 2014; Accepted 28 December 2014

Academic Editor: Hang Xu

Copyright © 2015 Francesco Cartella et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Realistic predictive maintenance approaches are essential for condition monitoring and predictive maintenance of industrial machines. In this work, we propose Hidden Semi-Markov Models (HSMMs) with (i) no constraints on the state duration density function and (ii) being applied to continuous or discrete observation. To deal with such a type of HSMM, we also propose modifications to the learning, inference, and prediction algorithms. Finally, automatic model selection has been made possible using the Akaike Information Criterion. This paper describes the theoretical formalization of the model as well as several experiments performed on simulated and real data with the aim of methodology validation. In all performed experiments, the model is able to correctly estimate the current state and to effectively predict the time to a predefined event with a low overall average absolute error. As a consequence, its applicability to real world settings can be beneficial, especially where in real time the Remaining Useful Lifetime (RUL) of the machine is calculated.

#### 1. Introduction

Predictive models that are able to estimate the current condition and the Remaining Useful Lifetime of an industrial equipment are of high interest, especially for manufacturing companies, which can optimize their maintenance strategies. If we consider that the costs derived from maintenance are one of the largest parts of the operational costs [1] and that often the maintenance and operations departments comprise about 30% of the manpower [2, 3], it is not difficult to estimate the economic advantages that such innovative techniques can bring to industry. Moreover, predictive maintenance, where in real time the Remaining Useful Lifetime (RUL) of the machine is calculated, has been proven to significantly outperforms other maintenance strategies, such as corrective maintenance [4]. In this work, RUL is defined as the time, from the current moment, that the systems will fail [5].* Failure*, in this context, is defined as a deviation of the delivered output of a machine from the specified service requirements [6] that necessitate maintenance.

Models like Support Vector Machines [7], Dynamic Bayesian Networks [8], clustering techniques [9], and data mining approaches [10] have been successfully applied to condition monitoring, RUL estimation, and predictive maintenance problems [11, 12]. State space models, like Hidden Markov Models (HMMs) [13], are particularly suitable to be used in industrial applications, due to their ability to model the latent state which represents the health condition of the machine.

Classical HMMs have been applied to condition assessment [14, 15]; however, their usage in predictive maintenance has not been effective due to their intrinsic modeling of the state duration as a geometric distribution.

To overcome this drawback, a modified version of HMM, which takes into account an estimate of the duration in each state, has been proposed in the works of Tobon-Mejia et al. [16–19]. Thanks to the explicit state sojourn time modeling, it has been shown that it is possible to effectively estimate the RUL for industrial equipment. However, the drawback of their proposed HMM model is that the state duration is always assumed as Gaussian distributed and the duration parameters are estimated empirically from the Viterbi path of the HMM.

A complete specification of a duration model together with a set of learning and inference algorithms has been given firstly by Ferguson [20]. In his work, Ferguson allowed the underlying stochastic process of the state to be a semi-Markov chain, instead of a simple Markov chain of a HMM. Such model is referred to as Hidden Semi-Markov Model (HSMM) [21]. HSMMs and explicit duration modeles have been proven beneficial for many applications [22–25]. A complete overview of different duration model classes has been made by Yu [26]. Most state duration models, used in the literature, are nonparametric discrete distributions [27–29]. As a consequence, the number of parameters that describe the model and that have to be estimated is high, and consequently the learning procedure can be computationally expensive for real complex applications. Moreover, it is necessary to specify a priori the maximum duration allowed in each state.

To alleviate the high dimensionality of the parameter space, parametric duration models have been proposed. For example, Salfner [6] proposed a generic parametric continuous distribution to model the state sojourn time. However, in their model, the observation has been assumed to be discrete and applied to recognize failure-prone observation sequence. Using continuous observation, Azimi et al. [30–32] specified an HSMM with parametric duration distribution belonging to the Gamma family and modeled the observation process by a Gaussian.

Inspired by the latter two approaches, in this work we propose a generic specification of a parametric HSMM, in which no constraints are made on the model of the state duration and on the observation processes. In our approach, the state duration is modeled as a generic parametric density function. On the other hand, the observations can be modeled either as a discrete stochastic process or as continuous mixture of Gaussians. The latter has been shown to approximate, arbitrarily closely, any finite, continuous density function [33]. The proposed model can be generally used in a wide range of applications and types of data. Moreover, in this paper we introduce a new and more effective estimator of the time spent by the system in a determinate state prior to the current time. To the best of our knowledge, a part from the above referred works, the literature on HSMMs applied to prognosis and predictive maintenance for industrial machines is limited [34]. Hence, the present work aims to show the effectiveness of the proposed duration model in solving condition monitoring and RUL estimation problems.

Dealing with state space models, and in particular of HSMMs, one should define the number of states and correct family of duration density, and in case of continuous observations, the adequate number of Gaussian mixtures. Such parameters play a prominent role, since the right model configuration is essential to enable an accurate modeling of the dynamic pattern and the covariance structure of the observed time series. The estimation of a satisfactory model configuration is referred to as* model selection* in literature.

While several state-of-the-art approaches use expert knowledge to get insight on the model structure [15, 35, 36], an automated methodology for model selection is often required. In the literature, model selection has been deeply studied for a wide range of models. Among the existing methodologies, information based techniques have been extensively analyzed in literature with satisfactory results. Although Bayesian Information Criterion (BIC) is particularly appropriate to be used in finite mixture models [37, 38], Akaike Information Criterion (AIC) has been demonstrated to outperform BIC when applied to more complex models and when the sample size is limited [39, 40], which is the case of the target application of this paper.

In this work AIC is used to estimate the correct model configuration, with the final goal of an automated HSMMs model selection, which exploits only the information available in the input data. While model selection techniques have been extensively used in the framework of Hidden Markov Models [41–43], to the best of our knowledge, the present work is the first that proposes their appliance to duration models and in particular to HSMMs.

In summary, the present work contributes to condition monitoring, predictive maintenance, and RUL estimation problems by(i)proposing a general Hidden Semi-Markov Model applicable for continuous or discrete observations and with no constraints on the density function used to model the state duration;(ii)proposing a more effective estimator of the state duration variable , that is, the time spent by the system in the th state, prior to current time ;(iii)adapting the learning, inference and prediction algorithms considering the defined HSMM parameters and the proposed estimator;(iv)using the Akaike Information Criterion for automatic model selection.

The rest of the paper is organized as follows: in Section 2 we introduce the theory of the proposed HSMM together with its learning, inference, and prediction algorithms. Section 3 gives a short theoretical overview of the Akaike Information Criterion. Section 4 presents the methodology used to estimate the Remaining Useful Lifetime using the proposed HSMM. In Section 5 experimental results are discussed. The conclusion and future research directions are given in Section 6.

#### 2. Hidden Semi-Markov Models

Hidden Semi-Markov Models (HSMMs) introduce the concept of variable duration, which results in a more accurate modeling power if the system being modeled shows a dependence on time.

In this section we give the specification of the proposed HSMM, for which we model the state duration with a parametric state-dependent distribution. Compared to nonparametric modeling, this approach has two main advantages:(i)the model is specified by a limited number of parameters; as a consequence, the learning procedure is computationally less expensive;(ii)the model does not require the a priori knowledge of the maximum sojourn time allowed in each state, being inherently learnt through the duration distribution parameters.

##### 2.1. Model Specification

A Hidden Semi-Markov Model is a doubly embedded stochastic model with an underlying stochastic process that is not observable (hidden) but can only be observed through another set of stochastic processes that produce the sequence of observations. HSMM allows the underlying process to be a semi-Markov chain with a variable duration or sojourn time for each state. The key concept of HSMMs is that the semi-Markov property holds for this model: while in HMMs the Markov property implies that the value of the hidden state at time depends exclusively on its value of time , in HSMMs the probability of transition from state to state at time depends on the duration spent in state prior to time .

In the following we denote the number of states in the model as , the individual states as , and the state at time as . The semi-Markov property can be written as where the duration variable is defined as the time spent in state prior to time .

Although the state duration is inherently discrete, in many studies [44, 45] it has been modeled with a continuous parametric density function. Similar to the work of Azimi et al. [30–32], in this paper, we use the discrete counterpart of the chosen parametric probability density function (pdf). With this approximation, if we denote the pdf of the sojourn time in state as , where represents the set of parameters of the pdf relative to the th state, the probability that the system stays in state for exactly time steps can be calculated as . Considering the HSMM formulation, we can generally denote the state dependent duration distributions by the set of their parameters relative to each state as .

Many related works on HSMMs [31, 32, 44, 45] consider within the exponential family. In particular, Gamma distributions are often used in speech processing applications. In this work, we do not impose a type of distribution function to model the duration. The only requirement is that the duration should be modeled as a positive function, being negative durations physically meaningless.

HSMMs require also the definition of a “dynamic” transition matrix, as a consequence of the semi-Markov property. Differently from the HMMs in which a constant transition probability leads to a geometric distributed state sojourn time, HSMMs explicitly define a transition matrix which, depending on the duration variable, has increasing probabilities of changing state as the time goes on. For convenience, we specify the state duration variable in a form of a vector with dimensions as The quantity can be easily calculated by induction from as where is if , otherwise.

If we assume that at time the system is in state , we can formally define the duration-dependent transition matrix as with

The specification of the model can be further simplified by observing that, at each time , the matrix can be decomposed in two terms: the recurrent and the nonrecurrent state transition probabilities.

The recurrent transition probabilities , which depend only on the duration vector and the parameters , take into account the dynamics of the self-transition probabilities. It is defined as the probability of remaining in the current state at the next time step, given the duration spent in the current state prior to time : The denominator in (5) can be expressed as , which is the probability that the system, at time , has been staying in state for at least time units. The above expression is equivalent to , where is the duration cumulative distribution function relative to the the state , that is, . As a consequence, from (5) we can define the recurrent transition probabilities as a diagonal matrix with dimensions , as The usage of the cumulative functions in (6), which tend to 1 as the duration tends to infinity, suggests that the probability of self-transition tends to decrease as the sojourn time increases, leading the model to always leave the current state if time approaches infinity.

The nonrecurrent state transition probabilities, , rule the transitions between two different states. It is represented by a matrix with the diagonal elements equal to zero, defined as must be specified as a stochastic matrix; that is, its elements have to satisfy the constraint for all .

As a consequence of the above decomposition, the dynamic of the underlying semi-Markov chain can be defined by specifying only the state-dependent duration parameters and the nonrecurrent matrix , since the model transition matrix can be calculated, at each time , using (6) and (7): where is the identity matrix. If we denote the elements of the dynamic transition matrix as , the stochastic constraint for all and is guaranteed from the fact that is a diagonal matrix and is a stochastic matrix.

For several applications it is necessary to model the* absorbing state* which, in the case of industrial equipment, corresponds to the “broken” or “failure” state. If we denote the absorbing state as with , we must fix the th row of the nonrecurrent matrix to be and for all with . By substituting such matrix in (8), it is easy to show that the element and remains constant for all , while the duration probability parameters are not influent for the absorbing state . An example of absorbing state specification will be given in Section 5.

With respect to the input observation signals, in this work we consider both continuous and discrete data, by adapting the suitable observation model depending on the observation nature. In particular, for the continuous case, we model the observations with a multivariate mixture of Gaussians distributions. This choice presents two main advantages: (i) a multivariate model allows to deal with multiple observations at the same time; this is often the case of industrial equipments modeling since, at each time, multiple sensors’ measurements are available, and (ii) mixture of Gaussians has been proved to closely approximate any finite and continuous density function [33]. Formally, if we denote by the observation vector at time and the generic observation vector being modeled as , the observation density for the th state is represented by a finite mixture of gaussians where is the mixture coefficient for the th mixture in state , which satisfies the stochastic constraint for and for and , while is the Gaussian density, with mean vector and covariance matrix for the th mixture component in state .

In case of discrete data, we model the observations within each state with a nonparametric discrete probability distribution. In particular, if is the number of distinct observation symbols per state and if we denote the symbols as and the observation at time as , the observation symbol probability distribution can be defined as a matrix of dimensions where Since the system in each state at each time step can emit one of the possible symbols, the matrix is stochastic; that is, it is constrained to for all .

Finally, as in the case of HMMs, we specify the initial state distribution which defines the probability of the starting state as

From the above considerations, two different HSMM models can be considered. In the case of continuous observation, , and in the case of discrete observation the HSMM is characterized by . An example of continuous HSMM with 3 states is shown in Figure 1.