Parameter Estimation of a Multistate Model for an Aging Piece of Equipment under Condition-Based Maintenance
We study a multistate model for an aging piece of equipment under condition-based maintenance and apply an expectation maximization algorithm to obtain maximum likelihood estimates of the model parameters. Because of the monitoring discontinuity, we cannot observe any state's duration. The observation consists of the equipment's state at an inspection or right after a repair. Based on a proper construction of stochastic processes involved in the model, calculation of some probabilities and expectations becomes tractable. Using these probabilities and expectations, we can apply an expectation maximization algorithm to estimate the parameters in the model. We carry out simulation studies to test the accuracy and the efficiency of the algorithm.
A multistate model may provide more flexibility than a traditional binary state model for modeling equipment conditions. In a multistate model, a piece of equipment is allowed to experience more than two possible states, for example, completely working, partially working, partially failed, and completely failed. Even if every state has an exponential duration distribution, the equipment has a nonexponentially lifetime distribution. Therefore, many authors utilize multistate models for equipment in many disciplines. Examples arise from electric power systems , the nuclear science , electron devices , and so on. Furthermore, although the deterioration process of a system is usually continuous, it is convenient to model it by several discrete states. For example, in , a deterioration process is classified into four states: initial, minor deterioration, major deterioration, and failure. Besides, [5, 6] define several discrete states for a system based on different ranges of the hazard rate of the system; that is, a deterioration process is often modeled by a multistate model.
On the other hand, maintenance is an indispensable action to keep the reliability level of industrial equipment. To reduce the total maintenance costs, condition-based maintenance is usually implemented. References [7, 8] are good overviews of condition-based maintenance. In this paper, condition-based maintenance involves three key ideas: a routine inspection, a condition indicator, and a repair. Routine inspections are executed with a fixed benchmark interval. In an inspection, we can acquire the current condition indicator for the equipment. An appropriate repair is carried out when the indicator is not less than a warning value. Moreover, we also record the condition indicator right after the equipment is repaired. This kind of condition-based maintenance is recommended by an IEEE standard , and there are many types of equipment operating in such mode [10, 11].
We assume that every state of the equipment has an exponential duration distribution specified by its parameter. In the paper, we investigate the parameter estimation problem, based on the data obtained from routine inspections and repairs.
Such a parameter estimation problem for a multistate model plays an important role in many applications [12, 13]. The problem attracts much attention, and many approaches are proposed. Reference  reviews several quite sophisticated empirical approaches. A method of curve fitting is presented in  for the probability density function (pdf) of the equipment lifetime. In [14–16], based on proportional hazards models, different approximate methods are proposed to obtain a maximum likelihood estimate (MLE). In these methods, it is assumed that the condition indicators are time independent or the state of the equipment is fixed within two consecutive inspections. Those assumptions are reasonable only for a small benchmark interval or frequent inspections. However, frequent inspections may increase the total costs and may induce a high chance of accidents.
There is an alternative method to obtain an MLE. To better illustrate the idea, we start from a complete data set of every state’s durations through continuous monitoring. As it has an exponential duration distribution, the reciprocal of the average of durations of a state is the MLE of its parameter. This simple example gives a clue to the case of condition-based maintenance. Then, given the incomplete data from routine inspections and repairs, we can calculate conditional expectations of every state’s duration, and we may deduce an MLE. The theoretical foundation of the above discussion is the expectation maximization (EM) algorithms. The EM algorithms are first introduced in  to calculate an MLE for incomplete data. Due to its theoretical property, (see, [18, 19] e.g.,), the EM algorithms have good performance in many applications.
Recently, a few researchers have tried to use EM algorithms for problems similar to ours. For example, based on a complicated discussion,  applied an EM algorithm to a specific 3-state model under devised condition-based maintenance. However, there is no hint in  to extend the result to a general multistate model. In general, although an EM algorithm may have good performance, it is difficult to apply an EM algorithm to a specific model. Besides the choice of hidden random variables or stochastic processes for a model, the calculation of involved conditional expectations is often difficult.
In this paper, we first propose a proper mathematical framework and a special technique of distributions to simplify the computation of conditional expectations. Then we apply an EM algorithm to obtain MLEs of the model parameters, with the observation of routine inspections and repairs. The paper is organized into seven sections. Section 2 establishes a model for our problem. In Section 3, we apply an EM algorithm, which is based on several conditional expectations, to the model. Section 4 establishes formulas of these conditional expectations. Section 5 presents simulation results, and Section 6 presents an application to a real-world dataset. The paper is concluded in Section 7.
In this section, we propose the mathematical framework for our problem. In the following Model 1, we present basic concepts of a new multistate model for an aging piece of equipment under condition-based maintenance. Besides, specific hidden random variables are proposed in Model 1.
Model 1. In the multistate model, a piece of equipment with discrete states under condition-based maintenance can be described by a predefined warning value and two families of random variables and , . The random variable has an exponential distribution specified by a parameter , .The state space of is . Finally, random variables and are independent.
In Model 1, for both and , the subscript indicates a time index, and the superscript presents the state of the equipment at the moment. Concretely, we assume that the state of the equipment is at a time instance and a repair takes place, and is the state of the equipment right after the repair. And denotes the residual duration of state after a time instance , given that there is no maintenance again. As introduced in Section 1, the duration of state has an exponential distribution, and is its parameter. Due to the memoryless property of an exponential distribution, the distribution of the residual duration is also exponential. Here and in the following, we assume that the benchmark interval is 1 for simplicity of argument.
To introduce some derivational concepts to describe the behavior of the equipment clearly, we introduce several families of auxiliary random variables. These random variables, , , and , are recursively defined. First, , .
For , where that indicates the state of the equipment at a time instance . In fact, assume that the state of the equipment is at a time instance , is the maximal number such that the total duration of states , and is less than 1. In other words, between two consecutive inspections at time indexes and , the equipment goes through states .
For , where indicates the time index when the th repair takes place.
Moreover, and for , Here denotes the state of the equipment right after a time instance , whether there is maintenance or not. In fact, if and no repair takes place, we have . If , there is a proper repair, and denotes the state of the equipment right after the repair, as showed in (2.3).
3. EM Algorithm
We now apply the EM algorithm to estimate the parameter , , given pairs of observations and , .
The EM algorithm, introduced in , is an iterative solution to MLE. Each iteration consists of two steps. In the Expectation step, the following functions are sought: where parameters . stands for parameters to be estimated in the current iteration, and denotes the parameters estimated in the previous iteration. As indicated in , the expectation in (3.1) is taken over the hidden random variable and and is the joint pdf of and , with the parameter . Estimates of the parameters are obtained in the maximization step by solving the following optimization: The estimation formulas are established by setting
As the pdf of the repairs has nothing to do with and the joint pdf of , has the following form:(3.3) leads to an unique solution . Here and for ,
Algorithm 3.1. Step 1. Let , and choose an initial value .Step 2. Set .Step 3. Stop and output if the stop criterion is satisfied.Step 4. Set and go to Step 2.
4. Conditional Expectations
To calculate the conditional expectations involved in (3.5), we propose some lemmas and theorems in the following.
Lemma 4.1. Assume that random variables are independent, has an exponential distribution specified by a parameter , and, for , we have . Then, the summation has a pdf . Here, for . is a function for arguments , defined by The above denominator is when .
The proof of Lemma 4.1 is based on the properties of polynomials and is presented in Appendix A. Further, we can calculate a useful conditional expectation and some basic probabilities based on Lemma 4.1.
Lemma 4.2. Under the assumption and notations in Lemma 4.1, we have the conditional expectation . Here, for ,
Proof. Let , then and are independent. Based on Lemma 4.1, the pdf of has a form
for , where
Let be the joint pdf of and . Since and are independent and , we have the following: Hence, the conditional pdf of with respect to is In the sequel, it follows from the expression of that The result follows from the above equation.
Lemma 4.3. Suppose that a random variable has an exponential distribution specified by a parameter , random variables and are independent, and for . Under the assumption and notations of Lemma 4.1, we have and . Here two functions and have the following form:
Proof. By computing the definite integral of the pdf of on the interval , we can obtain the expression of .
It follows from the independence among and that and are independent. Therefore, Here and in the following, . Besides, is the pdf of . In the sequel, follows from the above integral.
Then we can calculate several auxiliary important conditional expectations.
Theorem 4.4. Under the assumption and notations of Lemma 4.3, we have Here, and, for , has the following form: Besides, And, for , has the following form: Moreover, has the following form:
Proof. In the following, we calculate three expectations respectively.Case 1. . We have the following:
Here and are pdfs of random variables and , respectively. We have . The denominator is presented in Lemma 4.3. Similarly to the proof of Lemma 4.3, the above numerator is
Simplifying the above expression, we can obtain that .Case 2. We have . When , it follows from that we have
When , according to the total expectation formula, we have
Then it follows from Lemma 4.2 and (4.20) that
is the above definite integral.Case 3. We have . When , it is obvious that
and the result follows.
When , similarly to the discussion of Case 2, we have As and are independent, and is a function of , we have . Hence, it follows from Lemma 4.2 that The result follows from the above equation.
Lemma 4.5. For random variables involved in Model 1, we have the following: Here, and .
Theorem 4.6. We assume that there is a guess for the parameter vector of Model 1 for which if . Let for and . When , we have the following: When , we have Finally, when and , we have
5. Simulation Results
In this section, we test the efficiency and accuracy of the EM algorithm, based on simulation data.
At first, we investigate the necessity of the condition “ for ” in Theorem 4.6. We can find similar conditions for many theorems and lemmas in Section 4, and there is the root in Lemmas 4.1 and 4.2. Although these conditions make our analysis clear and convenient, they appear to be obstacles for an application.
Now we turn to the assumption of Lemma 4.2. Assume that random variables , , and are independent, and has an exponential distribution specified by a parameter . Based on Lemma 4.2, we can compute the conditional expectation of conditional on the summation to be any value, for example, 1.
Let and We draw the surface of in Figure 1. Figure 1 shows that although does not exist if , , or , but by filling proper values, is continuous. That is, has some removable discontinuities. The result is obvious from the viewpoint of engineering; why expectations in Section 4 have “jumps" for any special cases? Similar conclusion can be obtained for the assumption of Lemma 4.1 from the viewpoint of mathematics. In fact, the pdf of is an analytical function on . As a convolution, the pdf of is also an analytical function on , . And any discontinuity is removable for an expression of such a function.
Similar discussions can be applied to other theorems and lemmas in Section 4. In a word, these conditions may not trouble us for any application. For a numerical implement of the EM algorithm, if parameters become the same by accident, a small perturbation can fix the problem.
Secondly, we present a method to simulate the behavior of a 4-state model for an aging piece of equipment under condition-based maintenance, based on some preset parameters. The simulation can provide us a test data set.
Without maintenance, the equipment would run through states 1, 2, 3 orderly to a failure state 4. With maintenance, the straight path to a failure is regularly deflected by inspections and maintenance. In an inspection, if we find that the current state is 1 or 2, no repair is applied. If the state is 3, an appropriate repair is carried out. In this case, the state of the equipment right after the repair is a random variable, which is 1 with probability 0.1, is 2 with probability 0.3, and is 3 with probability 0.6. If the state of the equipment is 4, we must replace it with a new piece of equipment, and the state right after this replacement is 1. The duration of states 1, 2, and 3 have exponential distributions specified by parameters 0.3, 0.29, and 0.5. These parameters are suggested by .
Now we turn to five experiments in which we test the efficiency and accuracy of the EM algorithm, based on simulation data provided by the above method.
In the first experiment, we study the relationship between convergence error and the number of iterations. As EM algorithms are recursive, this experiment will provide us with an intuitive stop criterion for Algorithm 3.1. For a data set from 100 inspections, Figure 2 shows that all curves of the estimated parameters , , and converge while the number of iterations increases. We can conclude from Figure 2 that for 100 or more iterations, we can obtain reasonable results.
In the second experiment, we run the EM algorithm for different initial values. Figure 3 draws the estimated parameter pairs for 100 different initial values which are chosen randomly from 0.001 to 15. We can see that the EM algorithm converges to the same result for a great range of initial values.
In the third experiment, we run the EM algorithm for different sample sizes. Given the number of inspections = 20, 40, 100, 200, 400, and 1000, we run the simulating equipment and obtain 6 data sets. Then based on these data sets, we estimate parameters , , and , respectively. For the EM algorithm, the initial values of estimated parameters are , and the number of iterations is 100. Table 1 shows the estimated parameters for different data sets. Of course, the estimated parameters are nonmonotonic for different numbers of inspections, which is inherent to such a random problem.
In the fourth experiment, we repeat the third experiment 100 times to obtain the standard errors of each estimated parameter. Table 2 shows the standard errors of the estimated parameters. Again, all cases demonstrate a decrease in the standard errors, when the number of inspections increases. We can conclude from Tables 1 and 2 that, for 100 or more inspections, we can obtain reasonable estimated parameters with an acceptable standard error.
Finally, in the fifth experiment, we compare our method with another method.
As indicated in Section 1, [14–16] used an assumption that the state of the equipment is fixed during two consecutive inspections for their problems. Under this assumption, we can establish an MLE for every parameter. For pairs of observations and , , we write , , and for , Here records the moment when the state of the equipment is changed and records the corresponding new state. Let be the number of the set . Then, under the assumption of [14–16], records the th duration of the state of the equipment. Finally, let the total number of changing be the maximal subscript of . Then, we have the following MLE:
On the other hand, although we assume that, in our method, the benchmark interval is 1 for simplicity of argument, it is not too difficult to implement our method for other benchmark intervals. We run the simulating equipment for 400 time units and obtain inspection data sets for different benchmark intervals, respectively.
Figure 4 shows the estimated parameter of two methods. We can conclude that for frequent inspections, (the corresponding benchmark interval is less than 0.2) two methods have similar behaviors. However, we can find that two methods show a striking contrast when the benchmark interval is not less than 0.2. As indicated in , the latter is more important for practical situations. As our method is stable for all cases, it is more practical.
In this section, we apply the EM algorithm to a real-world dataset. The dataset is a 25-year record of inspections and repairs of a power transformer substation under condition-based maintenance. The power transformer substation consists of two power transformers, and it is a cool backup system. Once the primary transformer incurs faults, the secondary transformer replaces it automatically, and it can generally serve the entire load. The benchmark interval of inspections is three months. During an inspection, a transformer is repaired if it incurs faults, and the repaired primary transformer is put into operation. The repair is perfect because we can replace a transformer if necessary.
The behavior of the power transformer substation can be described by a 3-state model with a warning value and a three-month benchmark interval. Here state 1 implies that the primary transformer works, 2 implies that the secondary transformer works, and 3 implies that two transformers incur faults. In Table 3, we present the record of inspections. Here and are defined by (2.1), (2.3), and(2.4)
Based on the dataset given in Table 3, we apply Algorithm 3.1 and obtain the following MLE and for the primary and the secondary transformers, respectively That is, the mean duration of the primary transformer is 41.2 months, and it is 6.10 months for the secondary transformer.
We use the total time on test (TTT) plot to check the efficiency of the result. The TTT plot is obtained by plotting
against . Here is the order statistics of the sample. See  for a detailed introduction to TTT plot. From the dataset of Table t010207, the total lifetime of two transformers are 22, 81, 54, 54, 21, 9, and 45 months. We apply Monte Carlo simulations, with the MLE parameters and , to generate 100 independent total lifetime of two transformers. The TTT plots for the real-world, dataset and the simulated dataset is shown in Figure 5. In Figure 5, the circles represent the real data, and the solid points represent the simulated data with the estimated parameters. Figure 5 shows the efficiency of the result.
In this paper, we have addressed the parameter estimation problem for the multistate model for aging equipment under condition-based maintenance. Based on the memoryless property of exponential distributions, we proposed a convenient mathematical framework for the problem. In this framework, the calculation of involved conditional expectations became tractable. Then we applied an EM algorithm to obtain MLE of parameters. A sequence of simulation experiments shows that the estimated parameters converge to the preset value of the parameters, even for a moderate number of inspections. Moreover, the EM algorithm is stable for different length of benchmark intervals. Hence, the algorithm can be recommended for practical applications. It is convenient to extend the algorithm to the situation with random benchmark intervals.
In this paper, we assume that the state of the equipment can be directly observed at an inspection. However, it is a difficult task for many types of industrial equipment. In these cases, some other variables may be used to obtain estimates of the states. To establish a full model consisting of equipment, inspections, indirect variables, and investigating parameter estimations, it is challenging and valuable for practical situations.
A. Proof of Lemma 4.1
We proceed by an induction on to show the result in Lemma 4.1.
For , the result clearly holds. And for the convenience of the following discussion, we analyze the case of . In this case, the pdf of is the convolution of the pdf of and the pdf of . Thus, for ,
Assume that for , (4.1) and (4.2) hold. That is, for , where Let , we have the following: Due to the linearity of convolution, we can derive from (A.1) that, for , Here, for , Therefore, (A.1) holds for . As to , we have
It is obvious that is a polynomial in a variable , and the degree of is less than . Moreover, we have , . Since a polynomial with a degree less than must be a zero polynomial if it has different roots. Hence, , or . Hence,
So the result is true for . This completes the proof.
B. Proof of Lemma 4.5
For a time instance , let and . We will consider the following filtrations:
Let and for given . We will prove that for every events , , and , That is, and are independent, conditional on and .
It is obvious that . Moreover, as we have . Hence, due to the independence among , , and , we have the following: Therefore, we have the following: Further, as and are independent, we obtain that It follows from (B.5), and the above equation that This implies (B.2).
Then, we have the following: As for every , taking the projection of both sides of (B.8), we have the result
The authors acknowledge the funding of the National Natural Science Foundation of China (Grant Number 50977073). Qihong Duan acknowledges the funding of the National Natural Science Foundation of China (Gant Number 70971109).
Y. Ding, P. Wang, L. Goel, R. Billinton, and R. Karki, “Reliability assessment of restructured power systems using reliability network equivalent and pseudo-sequential simulation techniques,” Electric Power Systems Research, vol. 77, no. 12, pp. 1665–1671, 2007.View at: Publisher Site | Google Scholar
J. Endrenyi and G. J. Anders, “Aging, maintenance, and reliability approaches to preserving equipment health and extending equipment life,” IEEE Power and Energy Magazine, vol. 4, pp. 59–67, 2005.View at: Google Scholar
IEEE Std 338, IEEE Standard Criteria for the Periodic Surveillance Testing of Nuclear Power Generating Station Safety Systems, 2006.
Office of the Secretary of Defense, Report of the Defense Science Board Permanent Task Force on Nuclear Weapons Surety on Nuclear Weapons Inspections for the Strategic Nuclear Forces. pp. 28—32, 2008.
D. Banjevic, A. K. S. Jardine, V. Makis, and M. Ennis, “A control-limit policy and software for condition-based maintenance optimization,” INFOR Journal, vol. 39, no. 1, pp. 32–50, 2001.View at: Google Scholar