`Mathematical Problems in EngineeringVolume 2010, Article ID 242567, 16 pageshttp://dx.doi.org/10.1155/2010/242567`
Research Article

## An Expectation Maximization Algorithm to Model Failure Times by Continuous-Time Markov Chains

1Department of Mathematics, Faculty of Science, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China
2School of Electrical Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China

Received 5 June 2010; Revised 28 July 2010; Accepted 29 July 2010

Copyright © 2010 Qihong Duan et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

In many applications, the failure rate function may present a bathtub shape curve. In this paper, an expectation maximization algorithm is proposed to construct a suitable continuous-time Markov chain which models the failure time data by the first time reaching the absorbing state. Assume that a system is described by methods of supplementary variables, the device of stage, and so on. Given a data set, the maximum likelihood estimators of the initial distribution and the infinitesimal transition rates of the Markov chain can be obtained by our novel algorithm. Suppose that there are transient states in the system and that there are failure time data. The devised algorithm only needs to compute the exponential of upper triangular matrices for times in each iteration. Finally, the algorithm is applied to two real data sets, which indicates the practicality and efficiency of our algorithm.

#### 1. Introduction

Among many quantitative analysis methods of system reliability, the state space representation has often been employed by reliability engineers, for example, [15]. The state space representation has been recognized by industrial IEC61511 standards [6]. This method assumes that the structure of the system is modelled with a continuous-time Markov chain (CTMC). In the state space representation, nodes represent states of the system and arcs represent the transitions between nodes. This method is well adapted to study the reliability of various systems and allows an exact analysis of their probability of failure.

In the state space representation method, the time interval between two consecutive transitions is a random variable of an exponential distribution. However, many studies have presented counterexamples. For some mechanical and electronic components, the failure rate function of time to failure (TTF) has a bathtub curve: a monotonically decreasing function initially, eventually becoming a constant, and finally changing to a monotonically increasing function after sufficient time elapses [7]. Because existing approaches do not provide a perfect solution to deal with this class of systems, several researchers have explored models with nonexponential distributions (see [810] and references therein). Among many approaches treating TTF with nonexponential distributions, the extended Markov-model [11] is recommendable. In the extended Markov-model, an operation state is divided into substates with different levels of failure rates, which result in a nonconstant failure rate of the operation state. That is, in these models, the bug of failure rate functions is fixed within the framework of the state space representation (see [12, 13] and references therein for some applications).

Methods of supplementary variables [14] and the device of stages [15] are two classical approaches of extended Markov-models. These two approaches allow one to model a wide range of experimental probability density functions through a set of additional stages in series or in parallel. For supplementary variables with given parameters, a formula for evaluating the occurrence of failures is derived in [16]. In fact, with suitable parameters, additional stages can approximate the behavior of many types of distributions in practice. Reference [17] is a recent application of these approaches.

Assume that the structure of a system is established. That is, the nodes and arcs of the system are given. For a experimental data set on the TTFs of the system, the bottleneck of the above two approaches is the estimation of parameters for the nodes and arcs. Since the system is modelled with a CTMC, the TTFs are phase-type distributed random variables. The problem is similar to phase-type fitting methods [18]. However, traditional phase-type fitting methods give the maximum likelihood estimators of the minimal representation [18], which may not correspond to the system with the expected structure. Moreover, from the standpoint of computational complexity, traditional methods are not suitable for the system with many paths.

Inspired by the theory of continuous-time Markov chains observed in continuous-time channels (see [19, 20]), we propose a new expectation maximization (EM) algorithm for data of TTFs in this paper. In Section 2 we specify the framework and hidden variables for the system. In Section 3 we derive several conditional probabilities and conditional expectations, which will be used in the EM algorithm. In Section 4 we present the new EM algorithm. In Section 5, we provide numerical results. Section 6 concludes the paper.

#### 2. System Definition and Hidden Variables

Assume that a system is modelled by a CTMC with transient states , and one absorbing state . The infinitesimal transition rates of satisfy The quantities , are unknown. However, the characteristic matrix , which represents the structure of the system, is the known. Assume that the characteristic is an upper triangular matrix, that is, for . We suppose that the initial probability function satisfies and are unknown.

Example 2.1 (The device of stages). The system has a state transition diagram shown in Figure 1. The characteristic matrix of this system is Hence is an upper triangular matrix.

Figure 1: The state transition diagram of a device of stages.

Example 2.2 (The parallel system). The system has a state transition diagram shown in Figure 2. The characteristic matrix of this system is Hence is also an upper triangular matrix.

Figure 2: The state transition diagram of a parallel system.

Like the above two examples, the structures of many practical systems can be represented by upper triangular characteristic matrixes.

We define a path, the sequence of nodes visited from an initial node to the absorbing node . The th path, , is characterized by a set of nodes , representing the nodes visited before absorption, where for and . Since the characteristic matrix and hence the infinitesimal transition rates , are upper triangular matrixes, any node will never appear again in a path. So the length of a path is no more than , and hence the number of paths is finite.

According to the above discussion, similarly to part b) of Theorem in [21], we can provide the following noncanonical construction of a Markov chain with the transition rates and the initial distribution , respectively. The hidden variables, which comprise the noncanonical construction, will serve the EM algorithm in Section 4.

Lemma 2.3. For , let be a sequence of exponentially distributed random variables with parameters , respectively. For , let be a sequence of random variables with discrete distributions described by probability functions Assume also that the sequences comprise an independent family. Define Then the constructed process is Markovian with the initial distribution and the infinitesimal transition rates . Moreover, are independent.
For a sample of , Lemma 2.3 indicates that there is a unique path corresponding to the discrete-time embedded Markov chain . We denote this path as . Write if the node appears in the path . Given a path , define the first node for this path, and the following node for any node .
A sample of the TTFs for a system is a family of independent identically distributed random variables. Precisely, the stochastic processes , are independent, which have the same infinitesimal transition rates and the same initial probability function . Then define Using the above notations, we have Based on the observations and the known characteristic matrix , our problem is to give estimators and for the unknown parameters.

#### 3. Conditional Probabilities and Conditional Expectations

To establish an EM algorithm for our problem, some conditional probabilities and conditional expectations will be calculated in this section. From now on, we will omit the superscript of the stochastic processes when the clarity is being held. Firstly, we recall the cumulative distribution function of the TTF , Here is the probability function of , which satisfies the Kolmogorov's forward equation . From the solution to the equation in exponential form, we have where the vector of dimensions is the transpose of . So the probability density function of the TTF is The above methodology can be used to calculate other probabilities and expectations in this section.

Lemma 3.1. For the above noncanonical construction of in Lemma 2.3, we have for , Here is a matrix of dimensions determined as follows: and are vectors of dimensions ,

Proof. Define the events Considering all possible realizations of , it follows from (2.8) that Since the characteristic matrix is an upper triangular matrix, the numbers of nodes along a path are monotonic increasing. Then if satisfies , we have and
Considering another stochastic process with the infinitesimal transition rates and the initial probability function , we can deduce a noncanonical construction consisting of . It follows from (3.9) that For a path of the stochastic process , the property means that the end of the path is the absorbing state . Similarly to (3.2), we have The result follows from the above equation and

Similarly to the discussion in Lemma 3.1, we have the following Lemma.

Lemma 3.2. For the noncanonical construction of in Lemma 2.3, , given , the conditional density function of is where and .

in Lemma 3.2 can be expressed through the 1-step transition probability matrix of the embedded Markov chain of the continuous-time Markov chain with the infinitesimal transition rates . However, since will be eliminated in the further propositions, it is not necessary to present the expression.

Theorem 3.3. For the noncanonical construction of in Lemma 2.3, , , and , we have Here and are the same as those in Lemma 3.1. is a matrix of dimensions determined as follows:

Proof. If , it is obvious that . Otherwise, by constructing an auxiliary stochastic process with the infinitesimal transition rates and the initial probability function , we can obtain, similarly to the proof of Lemma 3.1, that where . The required result can then be derived in the similar way as that in Lemma 3.1.

Theorem 3.4. For the noncanonical construction of in Lemma 2.3 and , we have where . is an identity matrix and is a matrix of dimensions determined as follows: is vectors of dimensions determined as follows:

Proof. It follows from (2.8) that given , where and are independent random variables. Given , the conditional density function of is Given , suppose that the conditional density function of is , where for . Since the Jacobian of the transformation is , the joint density of is given by . Then for , we have Here the conditional density function , given , is determined through (3.13) in Lemma 3.2.
As to the cumulative distribution function of , given , we consider another stochastic process with the infinitesimal transition rates and the initial probability function . It follows from Lemma 2.3 that has a noncanonical construction consisting of . Similarly to the discussion in Lemma 3.1, we have where . Hence Then it follows from (3.13) and (3.24) that Since the probability in (3.25) may jump at , the conditional density function corresponding to (3.24) is where is the Dirac function. is the same as that in Lemma 3.2. Then the integral in (3.22) becomes The desired result follows from the above equation and Lemma 3.2.

Remark 3.5. The integral in (3.17) can be efficiently computed by using an approach developed by Van Loan [22]. To apply that approach here, we first compute the exponential of the following block triangular matrix: Then the upper right block of this matrix gives the integral . Moreover, the matrix exponentials can be efficiently performed by using the diagonal Padé approximation with repeated squaring, as recommended in [22, 23].

Remark 3.6. Sometimes, the denominator of (3.17) is . For example, under some characteristic matrices, a node may never be reached. Since the estimation of is nonsense under this case, we can assign any value to . We suggest .

The following corollary can be deduced from the above theorems and the tower property of conditional expectations.

Corollary 3.7. For the noncanonical construction of in Lemma 2.3 and , we have Here is the unit basis vectors with “1” in the th position. and .

Proof. The first equation follows from Theorem 3.4 and the tower property of conditional expectations. The second equation follows from As to the third equation, we have From the solution to the Kolmogorov's forward equation in the exponential form, we obtain Similarly to (3.12), we can see that the third equation follows from the above two equations and (3.2).

#### 4. EM Algorithm

The EM algorithm is a two-step iterative process for computing the maximum likelihood estimate. This process is terminated when some imposed measure of convergence for the sequence of estimators is satisfied. A filter-based form of the EM algorithm for a Markov chain was presented in [24] and was developed in [25]. Here we review firstly the filter-based EM framework.

Let index a given family of probability density functions for some hidden random variable , where is an element of a parameter space . Then based on a hidden complete data , a likelihood can be written as Let a field . For a random variable , denotes the expectation of with assumption that is the parameter of s. Suppose another field .

Two iterative steps in the EM algorithm are as follows.

() Expectation step: fix , then compute , where

() Maximization step: Set as a best solution to the following optimization problem:

Under this framework, we propose an EM algorithm which attempts to find a maximum likelihood estimate (MLE) of the parameter . Here is a field generated from a sequence of TTFs. As for the hidden complete data, we choose and , where . As showed in Lemma 2.3, no information is gained or lost by considering stochastic processes , as hidden processes instead of and . However, our choice of the hidden random variables results in a simple E-step, and it leads to an explicit M-step for estimating the parameters.

In this case, the complete data likelihood can be written as The factors in (4.4) represent the probability functions of the hidden random variables . The factors represent the density functions of . Taking the logarithm of (4.4) yields where if and otherwise .

A new parameter estimate, say , is obtained as the parameter which maximizes the expected value of (4.5) given and a current parameter estimate . The constrained maximization gives us the following new estimate: where the conditional probabilities and the conditional expectations are given in Corollary 3.7. Then it follows from (2.5) that These expressions constitute the M-step of our EM algorithm.

#### 5. Two Applications

In this section, we present two examples to illustrate the results derived in the previous sections.

The first data set is a widely used sample of 50 components taken from [26], which possess a bath-shaped failure rate property. These data consist of times to failure of 50 devices put on life test at time 0. The data is shown in Table 1.

Table 1: Failure time data from [26].

Using our EM algorithm with , which means that there are transient states , and one absorbing state , we obtain

In many applications of the failure rate function, the total time on test (TTT) plot [26] was used to show the efficiency of the selected model. The TTT plot is obtained by plotting against . Here and , are the order statistics of the sample. To check the efficiency of the result, Monte Carlo simulations have also been carried out. We generated 500 independent TTFs, with the estimated parameters . Due to the sensitivity of TTT plot to , the simulated TTFs are truncated by the maximum of the data set. The TTT plots for the first data set and the truncated simulated TTFs are shown in Figure 3.

Figure 3: The TTT plots for the first data set and the simulated TTFs. The circles represent the real data and the solid points represent the simulated results with the estimated parameters.

Another real set of lifetime data set, taken from [27], is the failure data collected during unit testing. During a unit testing phase, the number of failures in a certain time interval (of equal length) is recorded. The total number of units tested is equal to 311. The data is given in Table 2.

Table 2: Failure time data from [27].

Using our EM algorithm with , we obtain The TTT plots for this data set and the Monte Carlo simulated TTFs are given in Figure 4.

Figure 4: The TTT plots for the second data set and the simulated TTFs. The dashed boxes represent the real data and the solid points represent the simulated results with the estimated parameters.

#### 6. Conclusions

One great advantage with the extended Markov-models is that these models can tackle operation states of nonexponential distributions under the framework of the state space representation. Based on the theories of continuous-time Markov chains and EM algorithms, this paper deals with the problem of parameters estimation for extended Markov-models. Given a data set with failure time data, for a system with transient states, the proposed EM algorithm only needs to compute the exponential of upper triangular matrices for times in each iteration. The effectiveness of the proposed EM algorithm is well demonstrated through two real world examples.

Two real applications show the practicality and efficiency of our new EM algorithm. Nevertheless, one theoretical limitation about EM algorithms is that its iterations sequence converges to a stationary point of the likelihood. Only in the case that the likelihood function is convex, the global convergence from any initial value can be ensured. However, our experience with many examples is against this theoretical drawback. In few practical cases, the stationary points are saddle points. As a simple perturbation can lead the algorithm to diverge from the saddle point, this phenomenon is thus not a problem in practice. However, to avoid that the limit point is only a local maximum of the likelihood, the initial value must be carefully selected in some cases. We suggest to use moments of related random variables to construct the initial estimates.

Of course, it will be interesting to investigate conditions ensuring the convergence to the global maximum of the likelihood. This is left for future research.

#### Acknowledgments

Dengfu Zhao and Qihong Duan acknowledge the funding of the National Natural Science Foundation of China (Grant number 50977073) and the Fundamental Research Funds for the Central Universities. Zhiping Chen and Qihong Duan acknowledge the funding of the National Natural Science Foundation of China (Grant number 70971109).

#### References

1. C. K. Pil, M. Rausand, and J. Vatn, “Reliability assessment of reliquefaction systems on LNG carriers,” Reliability Engineering and System Safety, vol. 93, pp. 1345–1353, 2008.
2. S. Y. Chen, C. Y. Yao, G. Xiao, Y. S. Ying, and W. L. Wang, “Fault detection and prediction of clocks and timers based on computer audition and probabilistic neural networks,” in Proceedings of the 8th International Workshop on Artificial Neural Networks (IWANN '05), vol. 3512 of Lecture Notes on Computer Science, pp. 952–959, 2005.
3. S. Y. Chen, Y. F. Li, and J. W. Zhang, “Vision processing for realtime 3-D data acquisition based on coded structured light,” IEEE Transactions on Image Processing, vol. 17, no. 2, pp. 167–176, 2008.
4. X. Zhao and L. Cui, “On the accelerated scan finite Markov chain imbedding approach,” IEEE Transactions on Reliability, vol. 58, no. 2, pp. 383–388, 2009.
5. S. Y. Chen, Y. F. Li, Q. Guan, and G. Xiao, “Real-time three-dimensional surface measurement by color encoded light projection,” Applied Physics Letters, vol. 89, no. 11, article no. 111108, 2006.
6. “IEC61511. Functional safety—safety instrumented systems for the process industry sector,” Geneva, Switzerland, IEC, 2004.
7. M. Xie, Y. Tang, and T. N. Goh, “A modified Weibull extension with bathtubshaped failure rate function,” Reliability Engineering and System Safety, vol. 76, pp. 279–285, 2002.
8. A. Pievatolo, E. Tironi, and I. Valadé, “Semi-Markov processes for power system reliability assessment with application to uninterruptible power supply,” IEEE Transactions on Power Systems, vol. 19, no. 3, pp. 1326–1333, 2004.
9. H. R. Guo, H. Liao, W. Zhao, and A. Mettas, “A new stochastic model for systems under general repairs,” IEEE Transactions on Reliability, vol. 56, no. 1, pp. 40–49, 2007.
10. M. Li, “Fractal time series—a tutorial review,” Mathematical Problems in Engineering, vol. 2010, Article ID 157264, 26 pages, 2010.
11. J. Endrenyi, G. J. Anders, and A. M. Leite Da Suva, “Probabilistic evaluation of the effect of maintenance on reliability—an application,” IEEE Transactions on Power Systems, vol. 13, no. 2, pp. 576–583, 1998.
12. M. Li and W. Zhao, “Representation of a stochastic traffic bound,” IEEE Transactions on Parallel and Distributed Systems, vol. 21, no. 9, pp. 1368–1372, 2010.
13. M. Li and S. C. Lim, “Modeling network traffic using generalized Cauchy process,” Physica A, vol. 387, no. 11, pp. 2584–2594, 2008.
14. D. R. Cox, “The analysis of non-Markovian stochastic processes by the inclusion of supplementary variables,” Mathematical Proceedings of the Cambridge Philosophical Society, vol. 51, pp. 433–441, 1955.
15. C. Singh, R. Billinton, and S. Y. Lee, “The method of stages for non-Markov models,” IEEE Transactions on Reliability, vol. 26, no. 2, pp. 135–137, 1977.
16. Y. Lam, “Calculating the rate of occurrence of failures for continuous-time markov chains with application to a two-component parallel system,” Journal of the Operational Research Society, vol. 46, pp. 528–536, 1995.
17. E. A. Oliveira, A. C.M. Alvim, and P. Melo, “Unavailability analysis of safety systems under aging by supplementary variables with imperfect repair,” Annals of Nuclear Energy, vol. 32, no. 2, pp. 241–252, 2005.
18. A. Bobbio, A. Horváth, M. Scarpa, and M. Telek, “Acyclic discrete phase type distributions: properties and a parameter estimation algorithm,” Performance Evaluation, vol. 54, no. 1, pp. 1–32, 2003.
19. R. J. Elliott, L. Aggoun, and J. B. Moore, Hidden Markov Models, vol. 29 of Applications of Mathematics, Springer, New York, NY, USA, 1995.
20. R. J. Elliott, Z. P. Chen, and Q. H. Duan, “Insurance claims modulated by a hidden Brownian marked point process,” Insurance: Mathematics & Economics, vol. 45, no. 2, pp. 163–172, 2009.
21. R. N. Bhattacharya and E. C. Waymire, Stochastic Processes with Applications, Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics, John Wiley & Sons, New York, NY, USA, 1990.
22. C. F. Van Loan, “Computing integrals involving the matrix exponential,” IEEE Transactions on Automatic Control, vol. 23, no. 3, pp. 395–404, 1978.
23. F. Carbonell, J. C. Jímenez, and L. M. Pedroso, “Computing multiple integrals involving matrix exponentials,” Journal of Computational and Applied Mathematics, vol. 213, no. 1, pp. 300–305, 2008.
24. R. J. Elliott, “New finite-dimensional filters and smoothers for noisily observed Markov chains,” IEEE Transactions on Information Theory, vol. 39, no. 1, pp. 265–271, 1993.
25. R. J. Elliott and W. P. Malcolm, “Discrete-time expectation maximization algorithms for Markov-modulated Poisson processes,” IEEE Transactions on Automatic Control, vol. 53, no. 1, pp. 247–256, 2008.
26. M. V. Aarset, “How to identify bathtub hazard rate,” IEEE Transactions on Reliability, vol. 36, no. 1, pp. 106–108, 1987.
27. M. Xie and C. D. Lai, “Reliability analysis using an additive Weibull model with bathtub-shaped failure rate function,” Reliability Engineering and System Safety, vol. 52, no. 1, pp. 87–93, 1996.