Research Article | Open Access
A Parameter Estimation Method of Shock Model Constructed with Phase-Type Distribution on the Condition of Interval Data
The phase-type distribution (also known as PH distribution) has mathematical properties of denseness and closure in calculation and is, therefore, widely used in shock model constructions describing occurrence time of a shock or its damage. However, in the case of samples with only interval data, modeling with PH distribution will cause decoupling issues in parameter estimation. Aiming at this problem, an approximate parameter estimation method based on building PH distribution with dynamic order is proposed. Firstly, the shock model established by PH distribution and the likelihood function under samples with only interval data are briefly introduced. Then, the principle and steps of the method are introduced in detail, and the derivation processes of some related formulas are also given. Finally, the performance of the algorithm is illustrated by a case with three different types of distributions.
The shock model is usually used in the study of a system’s reliability. The model was first proposed by Esary et al.  to describe time intervals between adjacent shocks and its damage by random distributions, as shown in Figure 1. When the cumulative damage (blue line) caused by the shock sequence (green lines) exceeds a certain threshold (red line), the system fails.
In the shock model, the occurrence time of shocks is usually expressed as a Poisson process [2, 3]; that is, the time interval of adjacent shocks satisfies the exponential distribution, and the one shock damage is also assumed to satisfy the same cluster. Of course, other forms of distributions [4–6], such as gamma distribution, log-normal distribution, and Weibull distribution, also have been used in constructing shock models. While these distribution clusters enhance the model’s adaptability, it also increases the difficulties in estimating their parameters.
Observed sample data are required to obtain the model’s parameters. When facing point sample data, their form of log-likelihood function is usually a sum form of its distribution with the logarithmic operator in the front, that is to say, the parameters are easy to be decoupled and methods of MLE (maximum likelihood estimation), including various types of Newton iterative methods , can be used for estimating parameters and obtaining a result with a high accuracy. When the sample data are interval data, the likelihood function often contains a large number of convolution operations since the data are incomplete. At this time, exponential distribution is often assumed to simplify the calculation, while it may cause the model to be inconsistent with the actual situation. Of course, some Monte Carlo methods  have also been proposed to solve the computational problems caused by convolution, but the estimation accuracy of the methods is usually not controllable.
Phase-type distribution (known as PH distribution) is a kind of probability distribution clusters based on continuous-time Markov process . Suppose a continuous-time Markov process with states, wherein the states are transient states, the state is an absorbing state, and represents the state of the random process at moment . Then, the infinitesimal generator matrix of the Markov process can be expressed as , wherein is a -dimensional invertible matrix and is a -dimensional column vector, satisfying , where is a column vector and full elements of 1 with corresponding dimensional.
Based on the above Markov process, a -order continuous PH distribution can be defined. A random variable is distributed as an absorption time taken to transition from any initial state to the absorbing state , wherein represents a row vector of probability distribution of each initial phase, is called the phase-type generator, and represents a column vector of absorption rate in which the respective phase transfers to the absorbing state.
The cumulative distribution function (CDF) of the PH distribution is
The probability density function (PDF) of the PH distribution is
The PH distribution is dense in the positive abscissa axis; that is, it can fit any type of probability distributions with the random variable located on the interval . In addition, the PH distribution also has the feature of closure in convolution operations, which greatly simplifies the calculation. Due to the above properties, it is widely used in shock modeling. Neuts and Bhattacharjee  first used PH distribution in shock model construction. Ochoa et al. [11, 12] used PH distribution to characterize time intervals of adjacent shocks and one shock damage to analyze reliability of systems. These works are constantly expanding the adaptability of shock models, but it can also be seen that the PH distribution is currently mainly used in shock model constructions under the samples with point data or partial interval data.
There are three general parameter estimation methods for PH distribution : maximum likelihood method, Monte Carlo method, and EM method. Among them, the maximum likelihood method, including various types of Newton iterative methods, is often used to solve some specific types of simple PH distributions; the Monte Carlo method is mainly used for specific forms of PH distribution with point data; the expectation maximization method (also known as the EM method) is the most widely used method for parameter estimation; however, it is mainly used in PH distribution with point data  or partial interval data . If only interval data can be observed, such as only censored data and interval failure data can be obtained, the model constructed with PH distribution will face the decoupling problem when performing parameter estimation. At this time, the methods described previously cannot be used directly.
Aiming at this problem, an approximate method for parameter estimation of the shock model constructed with PH distribution is proposed. By giving an estimated error and constructing PH distribution with dynamic order, the parameters of the shock model with only interval data samples can be estimated. The paper is organized as follows: in Section 2, the shock model constructed by PH distribution and the likelihood function with only interval data samples are briefly introduced. Section 3 reconstructs the process statistics based on the framework of the traditional EM method , explains the approximation principle in detail, derives the approximate estimated expression of the model parameters, and also gives the total steps of the entire algorithm. Section 4 tests the performance of the algorithm through a case with three different types of distributions; Section 5 summarizes the method proposed in this paper and briefly introduces the follow-up work.
2. Shock Model and Likelihood Function
2.1. Shock Model Based on PH Distribution
Some assumptions:(1)The failure threshold of a component is , and the censored time is .(2)The shock damage caused by the th shock is an iid random variable, and the corresponding PDF is , , and the CDF is , . The cumulative damage caused by the number of shock is , let , and the corresponding CDF is , and the PDF is .(3)The occurrence time of the th shock is , let , and the corresponding CDF is , , and the PDF is , . The time interval is also an iid random variable, and the corresponding PDF is , , and the CDF is , .(4)The occurrence time of a shock is independent of its damage and also independent of the cumulative damage that has been caused by shock sequence.
Let , , represent the probability that the occurrence time of the th shock exceed for the first time. Based on the previous assumptions, the reliability at moment iswhere represents the number of shocks that the component receives at the moment and represents the left limit of the threshold.
The following describes the specific forms of the previous parameters by PH distributions:(1)Assume that the one shock damage satisfies order PH distribution , wherein represents the probability distribution vector of initial phases, which satisfies , and represents the phase-type generator. Let represent the column vector of absorption rate. Then, the CDF of is , . The corresponding PDF is , .(2)The cumulative damage caused the number of shock satisfies the PH distribution , wherein represents the distribution probability vector of initial phases and represents the phase-type generator, in which represents the transfer rate matrix at the occurrence of a shock and the angular scale, respectively, represents the number of rows and columns of a combined vector or matrix. The corresponding column vector of absorption rate is . Then, the CDF of is , . The corresponding PDF is , . Let present the probability that the cumulative damage is no less than x after the nth shock for the first time. Then, , , , where represents a column vector that the th to the th elements are 1 and others are 0, with corresponding order.
The distribution of the adjacent shock time interval and shock time is ( order) and . The assumption forms of these two distributions are similar to the shock damage distributions described previously.
2.2. Likelihood Function of Interval Data
Assume that the samples are independent of each other and the data can only be obtained at the censored time . The information of observed sample data is as follows:(1)The total number of the samples is (2)The failure mode set of the samples is , wherein represents the censored sample and represents the failure sample (3)The number of shocks received by the samples is
Let represent all parameters to be estimated, where represents the parameter set of the distribution of the one shock damage and represents the parameter set of the distribution of the time interval. Based on the assumption that the occurrence time of the shock is independent of its damage, the likelihood function can be expressed separately as shown in Table 1.
In summary, the overall likelihood function based on the sample data is
It can be seen in Table 1 that parameters in equations with the integral cannot be decoupled easily with the general methods proposed in Section 1. Combined with the assumed model above, observation data, and likelihood function proposed in this section, a specific method on estimating parameters is introduced as follows.
3. Principles, Steps, and Derivation of the Method
It is not difficult to see from Table 1 that forms of the likelihood functions are essentially the same, so only the estimating process of parameters is described in detail here, which can also be used for estimating . According to the main steps of the EM method , it is necessary to construct a complete data likelihood function.
3.1. Process Statistics Construction
Four variables and an indicator function are defined in the following:(1) represents the phase of the sample after the th phase transition in the th shock damage period;(2) represents the damage caused by phase ;(3) represents the number of phase transitions of the sample during the th shock damage period, including the times when the damage period changes;(4) represents the number of additional shocks that may cause failure if the sample is censored with shock number . It can be seen that is a random variable and represents its specific sample value;(5)Indicator function represents whether the phase of the sample is after the th phase transition in the th shock damage period.
With the above variables and function, four process statistics can be constructed as follows:(1)The set of statistics represents whether the initial phase of the sample is during the th shock damage period. The values of are shown in Table 2.
Let represent a set of process statistics of all samples, where represents the set of process statistics of the sample .
3.2. Parameter Estimation Expression
With the above process statistics, the likelihood function under the complete data can be expressed as follows:wherein, the complete data likelihood function of the censored sample is
The complete data likelihood function of the failure sample is
A log-likelihood function can be obtained from equation (5):
According to the EM method, the posterior distribution of process statistics is
Assuming that the parameter values have been obtained by previous iteration, the expectation of posterior distribution of process statistics in equation (8) by equation (9) iswhere the expectation of the log-likelihood function with the censored sample data is
And, the expectation of the log-likelihood function with the failure sample data is
The parameters to be estimated satisfy the following constraints:(1)The initial phase probability satisfies (2)For any phase,
With the MLE method, the parameters can be expressed in the following equations:
The general steps of the EM method are Step E: solving the expectation sets of posterior distribution of the process statistics proposed in Section 3.1: Step M: bringing the result obtained in step E into equation (13) to obtain an iteration value of parameters .
It is not difficult to see that the set of process statistics constructed in Section 3.2 is infinite, and expectations of the posterior distribution of the process statistics cannot be obtained at this time, so an approximate method is introduced in the following sections.
3.3. Approximate Principle
Assume that the sample is censored after the th shock, and the probability of failure after another th shock is .
Then, the survival probability of the sample after the th shock can be expressed as
Let , then it is not difficult to see that for any determined PH distribution and failure threshold, is constantly satisfied.
That is, given any positive real number , there must exist at least one such that for any , is constantly satisfied.
Now, assume that is sufficiently small and satisfies and , then equation (15) can be approximated by a function of finite summation:where represents a column vector that the element from to is 1 and the others are 0, with corresponding order.
Based on the above description, the parameters can be approximated as
3.4. Expectations of Process Statistics with Posterior Distribution
After approximation, the expectations of process statistics with posterior distribution can be obtained referring to the method in [14, 15] by making some improvements. The results are given directly as follows:(1)Censored sample :(1)The expectations of with posterior distribution: ① When , where . ② When ,