Abstract

This paper considers three competing maintenance actions: corrective after failure, condition-based preventive to avoid failure, and scheduled. A stochastic model is set for the time relationship between corrective and preventive maintenance. This model defines a version of the so-called random signs censoring, and it leads to natural goodness measures for condition control. The effect of condition control on important observable and unobservable indicators is studied. The latter half of the paper considers static Weibull models for the time to failure and the effect of condition control; the main contribution is a calculation method that derives the four model parameters from four figures of input data. Corresponding parameter experimentation is demonstrated on maintenance cost analysis and condition control design.

1. Introduction

Since some decades, competing risks models have been employed and further developed in the reliability field. The notion “random signs censoring” [1] is well established, and different refinements have been introduced, for example, the “repair alert model” [2]. The repair alert model again has been modified further, for example, to incorporate opportunistic maintenance [3]. Cooke’s [4] claim for “more models of dependent competing risks in which critical failure probability is identifiable” applies to this paper, too. For broader information about the state of art, we refer to [3, 4].

Our basic tool is an independent random variable 𝑆 that represents the condition control system in two ways. First, a combination of 𝑆 and the unknown time 𝑋 to failure models the condition-based planned time to preventive maintenance; this model defines a random signs censoring, but it is not a repair alert model. Second, in terms of 𝑆, we also define natural measures for the effect of condition control: trustworthiness, exactness, and general goodness. Thereafter, we study some indicators (probabilities and means) as functions of 𝑆,𝑋, and the time 𝐻 to next scheduled maintenance—especially their relation to trustworthiness and exactness.

In the latter half of the paper, we add three assumptions frequently used in case studies. We develop an analytic-numeric method, where the output is two-parameter Weibull models for 𝑆 and 𝑋 and the input consists of four single figures, that is, values of indicators assessed from field data or presumed for new design. For comparison, many existing methods make use of the empirical subsurvival functions, which require quite detailed input data [5]. Then, the models obtained for 𝑆 and 𝑋 act as the centre of parameter experimentation. We analyse the maintenance-related long-term costs as a function of 𝐻, and we study the requirements to be put on the condition control in order to achieve specified improvements of the indicators.

The last section shortly declares a motive for extension.

2. Three Competing Maintenance Actions

Throughout this paper, the term component will stand for any equipment intended to execute a certain task and the term time always means execution time. Assume that every possible maintenance action can be assigned to one of the following three types.CMIf a failure (critical failure) occurs, it interrupts the task execution and corrective maintenance must be performed.PMThe condition control system (maintenance crew included) proposes a time instant for preventive maintenance to avoid failure and CM.SMRegardless of the component’s condition, there is a planned time point for next scheduled maintenance.

Thus, when a service sojourn starts after installation or a maintenance action (CM, PM, or SM), there are three potential time spans to be compared: 𝑋unknown time to failure and CM,𝑌decided condition-based time to PM,𝐻planned time to next SM (𝐻= if there is no SM plan).

Now, the least member of the triplet (𝑋,𝑌,𝐻) determines the type and the time instant of the maintenance action that terminates the service sojourn (Table 1).

Let us immediately interpret some orderings of 𝑋,𝑌, and 𝐻. (a) If 𝑋<𝐻𝑌, then a failure (CM) ended the sojourn and PM was not planned at all. (b) If 𝑋<𝑌<𝐻, a PM action was planned but in fact unnecessarily, since a failure (CM) came earlier. (c) If 𝑌<𝐻𝑋,PM was performed but in fact unnecessarily, since SM would have saved the component from failure. (d) If 𝑌𝑋<𝐻, the performed PM really saved the component from failure.

Generally taken, the spans𝑋,𝑌, and 𝐻 can depend dynamically on component age, maintenance history, environment, physical conditions, strategies, and so forth. We exemplify here with such SM schedules where the actions are fixed in advance on the time axis: if the 𝑖th sojourn terminated with a CM, that is, 𝑋𝑖<𝑌𝑖&𝑋𝑖<𝐻𝑖, then the time to next SM is 𝐻𝑖+1=𝐻𝑖𝑋𝑖, and if it terminated with a PM, that is, 𝑌𝑖𝑋𝑖&𝑌𝑖<𝐻𝑖, then 𝐻𝑖+1=𝐻𝑖𝑌𝑖.

3. Basic Models and Concepts

3.1. Condition-Based Time to Preventive Maintenance

When a component has started servicing after installation or maintenance (CM, PM, and SM), there will or would be a failure after a random time𝑋. Now, the condition control system (e.g., observations, sensors, monitoring, inspections, and intuitive thinking) has partial knowledge about 𝑋 and in order to avoid the failure, it can propose a time 𝑌 to next PM. We sum up this all in a natural assumption that fits many practical cases approximately: the ratio 𝑌/𝑋is statistically independent of 𝑋. In other words, our basic model will be𝑌=𝑆𝑋,where𝑆isanindependentpositiverandomvariable.(1) Note that model (1) defines a version of “random signs censoring” because the events {𝑌<𝑋} and {𝑆<1} are identical ([2], Definition  1).

Figure 1 illustrates how the point (𝑋,𝑌) defined by (1) hits one of the 𝑥𝑦-regions CM, PM, or SM, according to Table 1. The right-hand case depicts the “better” control system, since 𝑌<𝑋 more often and 𝑌/𝑋 is closer to 1. Section 3.2 provides exact definitions.

3.2. Goodness of Condition Control

The dimensionless random variable 𝑆 is independent and determines (together with 𝑋) the condition-based time to PM(𝑌). Therefore, the cumulative distribution of 𝑆, 𝐺(𝑠), can represent the condition control itself, in this context. Thus, we define the trustworthiness and the exactness of the condition control in terms of 𝐺(𝑠):𝑅=Pr(𝑆1)=𝐺(1),(2)𝐷=𝐸(𝑆𝑆1)=101𝐺(𝑠)𝐺(1)𝑑𝑠.(3) The attribute current can be added if 𝐺 changes dynamically from sojourn to sojourn.

Verbally expressed, trustworthiness is the probability that PM saves the component from failure and exactness is the expectation of the ratio 𝑌/𝑋 given that PM saves the component from failure. Indeed, (1) implies Pr(𝑌𝑋𝑋<𝐻)=Pr(𝑌𝑋)=𝑅 and 𝐸(𝑌/𝑋𝑌𝑋<𝐻)=𝐸(𝑌/𝑋𝑌𝑋)=𝐷 for all 𝐻>0. Further, both 𝑅 and 𝐷 dwell in the unit interval with “bad” values near 0 and “good” values near 1, so 𝑅𝐷 describes the general goodness of condition control. Figure 2 illustrates the measures 𝑅, 𝐷, and 𝑅𝐷.

3.3. Indicators for Applications

Let 𝐹(𝑥) and 𝐺(𝑠) be the cumulative distributions of 𝑋 and 𝑆 for a service sojourn, and let 𝐻 be the time to next SM. The probabilities that this sojourn terminates in CM and PM (𝜅𝑐,𝜅𝑝) and the corresponding sojourn length expectations (𝜇𝑐,𝜇𝑝) are given by 𝜅𝑐𝜅=Pr(𝑆>1,𝑋<𝐻)=𝐹(𝐻)(1𝐺(1)),(4)𝑝𝐻=Pr𝑆1,𝑋<𝑆=10𝐹𝐻𝑠𝜇𝑑𝐺(𝑠),(5)𝑐1=𝐸(𝑋𝑆>1,𝑋<𝐻)=𝐹(𝐻)H0𝜇𝑥𝑑𝐹(𝑥),(6)𝑝𝐻=𝐸𝑆𝑋𝑆1,𝑋<𝑆=1𝜅𝑝10𝑠0𝐻/𝑠𝑥𝑑𝐹(𝑥)𝑑𝐺(𝑠).(7) We will also pursue the probability 𝐾 for unnecessarily planned PM (i.e., the observable event that PM was planned but a failure occurred earlier) and the probability 𝐿 for unnecessarily performed PM (i.e., the unobservable event that PM was performed but SM would later have saved the component from failure): 𝐾=Pr(𝑋<𝑆𝑋<𝐻)=1𝐹𝐻𝑠𝑑𝐺(𝑠),(8)𝐿=Pr(𝑆𝑋<𝐻𝑋)=𝐻𝐺𝐻𝑥𝑑𝐹(𝑥).(9)

We clarify some immediate relations to the condition control. Good trustworthiness (𝑅) means low probabilities of failure (𝜅𝑐) and unnecessarily planned PM (𝐾), because 𝐾𝜅𝑐1𝑅 (by definitions). Good exactness (𝐷) again leads to low probability of unnecessarily performed PM(𝐿).

To justify the last statement, consider a sequence 𝑆𝑛 with cumulative distributions 𝐺𝑛(𝑠),𝑛=1,2, such that the exactness 𝐷𝑛1 (3). We state that 𝐺𝑛(𝑠)0 for every 𝑠(0,1) or, equivalently, 𝐺𝑛(𝐻/𝑥)0 for every 𝑥>𝐻. (If this were not true, there would be an 𝑠(0,1) and a 𝛿>0 such that 𝐺𝑖(𝑠)>𝛿 for all 𝑖 in an infinite set of positive integers, but this brings the contradiction 𝐷𝑖<1(1𝑠)𝛿for all 𝑖). Now, pick any 𝜀>0 and let 𝑎>𝐻 be such that 1𝐹(𝑎)<𝜀/2. The Lebesgue convergence theorem [6, Page 229] says that there is an integer 𝑛𝜀>0 such that 𝐴=𝑎𝐻𝐺𝑛(𝐻/𝑥)𝑑𝐹(𝑥)<𝜀/2,for𝑛>𝑛𝜀. Hence, 𝐿𝑛𝐴+1𝐹(𝑎)<𝜀,for𝑛>𝑛𝜀. Since 𝜀 was arbitrary, we have 𝐿𝑛0.

4. Static Models for Time to Failure and Condition Control

4.1. Additional Assumptions and the Main Problem

Until now, we have allowed 𝑋,𝑆, and 𝐻 to change dynamically from sojourn to sojourn. However, if sufficiently detailed input data from separate sojourns is not available, the modeller can be forced to add, for example, the following assumptions.(i)The time-to-failure distribution 𝐹(𝑥) is the same for every sojourn; that is, every CM, PM, and SM restores the component to the same condition.(ii)The distribution 𝐺(𝑠) is the same for every sojourn; that is, the relevant properties of the condition control do not change from sojourn to sojourn.(iii)The planned time 𝐻 to next SM is the same for every sojourn. (This means of course that the SM schedule moves forward after every CM and PM.)

Under these assumptions, all sojourns are statistically alike and a realization will be called CM sojourn or PM sojourn or SM sojourn, according to the terminating maintenance action. Recalling the indicators (4)–(7), we get a system of equations: ShareofCMsojourns=𝜅𝑐=5%,(4)ShareofPMsojourns=𝜅𝑝=16%,(5)AveragelengthofCMsojourns=𝜇𝑐=0.85𝐻,(6)AveragelengthofPMsojourns=𝜇𝑝=0.72𝐻.(7) The quantities 𝜅𝑐,𝜅𝑝,𝜇𝑐,𝜇𝑝 are simple field data or requirements for new design, and they are assumed to satisfy the natural conditions: 𝜅𝑐>0,𝜅𝑝>0,0<𝜅𝑐+𝜅𝑝<1,0<𝜇𝑝<𝜇𝑐<𝐻.(10) (The numerical values above will be used for subsequent illustrations.)

In the following sections, we show how to solve the system of equations (4)–(7) for the unknown cumulative distributions in the format𝐹(𝑥)=1𝑃(𝑥/𝐻)𝛼(0<𝑃<1,𝛼>0),(11)𝐺(𝑠)=1𝑄𝑠𝛽(0<𝑄<1,𝛽>0).(12) Note that the SM interval 𝐻 is also a scale parameter, as a change of variable (𝑥=𝐻𝑡) in (4)–(7) and (11) immediately shows.

4.2. Solution of 4 and 5

We start solving the problem posed above to get 𝑃, 𝛼, 𝑄, and 𝛽. By inserting (11) and (12) in (4) and (5) and by changing integration variable in (5),𝑢=1𝐺(𝑠), we obtain𝜅𝑃=𝑃(𝑄)=1𝑐𝑄,(13)𝐴(𝑄,𝛾)=1𝑄1𝑄𝑃(𝑄)(ln(𝑄)/ln(𝑢))𝛾d𝑢=𝜅𝑝,𝛾=𝛼/𝛽.(14) Since 𝑃 and 𝑄 are probabilities, (13) implies a first restriction 𝜅𝑐<𝑄<1. By inserting 𝛾=0 and 𝛾 on the left side of (14), we obtain the marginal functions 𝐴(𝑄,0)=(1/𝑄1)𝜅𝑐 and 𝐴(𝑄,)=1𝑄. They are both decreasing in 𝑄 and take the value 𝜅𝑝 once, namely, for the 𝑄-values: 𝑄min=𝜅𝑐𝜅𝑐+𝜅𝑝,𝑄max=1𝜅𝑝𝜅𝑐<𝑄min<𝑄max.<1(15)

These facts imply that 𝐴(𝑄,0)< 𝜅𝑝<𝐴(𝑄,) if and only if 𝑄min<𝑄<𝑄max. Because of continuity, there is for each 𝑄 in the interval (15) a positive number 𝛾=𝛾(𝑄) satisfying (14) and for other 𝑄 there are no 𝛾. In addition, by partial differentiation, one finds that 𝐴(𝑄,𝛾) is increasing as a function of 𝛾 for each 𝑄 in (𝜅𝑐,1) and decreasing as a function of 𝑄 for each 𝛾>0. This monotony settles some facts: 𝛾(𝑄) is unique and increasing and the triplets (𝑄,𝛾(𝑄),𝑃(𝑄)) form the solutions of (4) and (5).

Figure 3 collects the results of the numerical example (Section 4.1): 𝑄min=0.238095,𝑄max=0.840000, and 𝛾(𝑄) calculated for 200 equidistant 𝑄-values.

4.3. Solution of (6) and (7)

In the next step, we change variable in (6), 𝑢=1𝐹(𝑥), and insert (11) and (12). Thereby, we obtain the dimensionless form of (6),1𝐵(𝑃,𝛼)=1𝑃1𝑃ln(𝑢)ln(𝑃)1/𝛼𝜇𝑑𝑢=𝑐𝐻.(16) It is easy to see that 𝐵(𝑃,𝛼) is increasing as a function of 𝛼 for every𝑃(0,1) and to find the marginal functions: 𝐵(𝑃,0)0, 𝐵(𝑃,)1. Since 0<𝜇𝑐/𝐻<1, there is because of continuity, for every 𝑃(0,1), a unique 𝛼=𝛼(𝑃)>0 such that 𝐵(𝑃,𝛼)=𝜇𝑐/𝐻. Moreover, taking into account (13) and (15), we need here only the 𝑃 interval: 𝑃min=1𝜅𝑐𝜅𝑝,𝑃max=1𝜅𝑐𝜅𝑝1𝜅𝑝0<𝑃min<𝑃max.<1(17)

Figure 4 depicts the results of the numerical example (Section 4.1): 𝑃min=0.790000,𝑃max=0.940476, and 𝛼(𝑃) calculated for 200 equidistant 𝑃-values.

The functions 𝑃(𝑄),𝛾(𝑄), and 𝛼(𝑃) are now available. After changing variables in (7), 𝑣=1𝐺(𝑠),𝑢=1𝐹(𝑥), we obtain the dimensionless form:=1𝐶(𝑄)𝜅𝑝1𝑄ln(𝑣)ln(𝑄)𝛾(𝑄)/𝛼(𝑃(𝑄))×1𝑃(𝑄)(ln(𝑄)/ln(𝑣))𝛾(𝑄)ln(𝑢)ln(𝑃(𝑄))1/𝛼(𝑃(𝑄))=𝜇𝑑𝑢𝑑𝑣𝑝𝐻.(18) By inserting 𝑄min and the corresponding 𝑃min, 𝛾=0,𝛼(𝑃min)<,𝛽=, we obtain 𝐶(𝑄min)=𝜇𝑐/𝐻. Similarly, by inserting 𝑄max and the corresponding 𝑃max, 𝛾=, 𝛼(𝑃max)<,𝛽=0, we obtain 𝐶(𝑄max)=0. Since 𝜇𝑝<𝜇𝑐 and 𝐶(𝑄) is continuous, there is a 𝑄 in the interval (15) such that 𝐶(𝑄)=𝜇𝑝/𝐻.

Figure 5 depicts the results of the numerical example (Section 4.1): 𝐶(𝑄) calculated for 200 equidistant 𝑄-values and the solution 𝑄=0.6449 of (18).

4.4. Results and Comments

The solution 𝑄 of (18) determines now the other parameters: 𝑃=𝑃(𝑄),𝛾=𝛾(𝑄),𝛼=𝛼(𝑃),and𝛽=𝛼/𝛾. Insertion of the parameters 𝑃,𝛼,𝑄,and 𝛽 in (11) and (12) yields the time-to-failure distribution 𝐹(𝑥) and the condition control representative 𝐺(𝑠). The numerical values of the example (Section 4.1) are depicted in Table 2.

Remark 1. The calculation method in Section 4.24.4 lends itself for relatively simple and reliable computation, especially because of the stepwise structure and the monotony of certain functions mentioned above. The solutions of (14) and (16), 𝛾(𝑄) and 𝛼(𝑃), were iterated for 200 equidistant values of the variables in their respective domains, and linear interpolation was used for intermediate values. The software to be employed for accurate programming may readily offer built-in equation solving, numerical integration, and simple programming tools.

Remark 2. The input quadruplet (𝜅𝑐,𝜅𝑝,𝜇𝑐,𝜇𝑝) led to the output quadruplet (𝑃,𝛼,𝑄,𝛽). However, the same tools also apply to certain mixed input quadruplets, for example, (𝜅𝑐,𝜅𝑝,𝛼,𝛽), (𝜅𝑐,𝜅𝑝,𝑃,𝛼), (𝜅𝑐,𝜅𝑝,𝑃,𝛽), (𝜅𝑐,𝜅𝑝,𝑄,𝛼), and (𝜅𝑐,𝜅𝑝,𝑄,𝛽). Indeed, check the consistency of the input in (10), (15), and (17) and then apply (6), (7), (13), (14), and the relation 𝛼=𝛽𝛾 in suitable order to get the remaining unknowns.

5. Parameter Experimentation

Experimentation with model parameters is a common way to get first-hand information about causalities and sensitivities. Conceive a case with the assumptions and the data of Section 4.1, and with SM interval 𝐻=1. Table 2 delivers the parameter values of the static representatives of the component and the condition control: 𝐹(𝑥)=1𝑃𝑥𝛼where𝑃=0.9225,𝛼=5.7909,𝐺(𝑠)=1𝑄𝑠𝛽where𝑄=0.6449,𝛽=3.0263.(19)

Example 1. We look at the maintenance-related costs as a function of 𝐻, while the component (𝑃,𝛼) and the condition control (𝑄,𝛽) stay unchanged. Suppose that the costs for different maintenance actions are proportionally 𝐷=1 per SM, 𝐷𝑐=10 per CM, 𝐷𝑝=4 per PM, and 𝐷𝑝0=1 per unnecessarily planned PM (8). (These costs can include work, spare parts, facilities, lost revenue, penalties, etc.) The indicators (4)–(8) are functions of 𝐻 only, so the long-term maintenance-related cost per unit execution time is given by =𝜅Cost(𝐻)𝑐(𝐻)𝐷𝑐+𝜅𝑝(𝐻)𝐷𝑝𝜅𝑐(𝐻)𝜇𝑐(𝐻)+𝜅𝑝(𝐻)𝜇𝑝(𝐻)+1𝜅𝑐(𝐻)𝜅𝑝𝐻+(𝐻)1𝜅𝑐(𝐻)𝜅𝑝(𝐻)𝐷+𝐾(𝐻)𝐷𝑝0𝜅𝑐(𝐻)𝜇𝑐(𝐻)+𝜅𝑝(𝐻)𝜇𝑝(𝐻)+1𝜅𝑐(𝐻)𝜅𝑝𝐻.(𝐻)(20) Figure 6 depicts the cost function, from which we can pick Cost(1)=2.0502 (original), Cost(0.7691)=1.8053 (minimum), and Cost()=6.5904 (without SM).

Example 2. We analyse the effect of the condition control (𝑄,𝛽) while (𝑃,𝛼) and 𝐻=1 stay fixed. Suppose that we want the probability of failure down from the present 𝜅𝑐=𝐹(1)𝑄=5%to3%, (4). Clearly, this requires that 𝑄=0.64493/5=0.3869. Because of this change, the probability of unnecessarily performed PM (9) increases from 𝐿=0.132to0.258, but suppose that we want 𝐿=0.100. By solving (9), we get the required 𝛽=7.0871. Moreover, according to Table 3, the required changes (𝜅𝑐=5%3%and𝐿=0.1320.100) correspond to improving the trustworthiness from 𝑅=0.3551𝑡𝑜0.6131 and the exactness from 𝐷=0.7280to0.8481. In fact, this correspondence is always valid, since there is a continuous bijection between the pairs (𝑅,𝐷) and (𝑄,𝛽).

6. A Motive for Extension

Discrete event simulation is the superior alternative for modelling complex real-world processes over long periods [7, Page 2–4], and the related field of random variate generation provides the modeller with an enormous variety of algorithms, (e.g., [8]). Generalizations, extensions, ex tempore decisions, and other intricate details are much easier to model in a proper stochastic simulation environment than in an analytic formulation. An imperative benefit, for example, for persuasive cost and risk accounting, is the output of various complete probability distributions (instead of “means”).

These facts among many motivate developing our concepts towards dynamic simulation. The constellation and the tools of Sections 2 and 3 are applicable as such, but Sections 4 and 5 deal with static models only. Let us sketch a first step, where the time 𝑋 to next failure (and CM) can depend on the component’s age 𝑡 (cumulated execution time) at the sojourn start. Now, the distribution of 𝑋 could be, for example, of the type 𝐹(𝑥𝑡,𝛼)=1𝑒Λ(𝑡,𝛼)Λ(𝑡+𝑥,𝛼) that generates a nonhomogenous Poisson failure process with cumulative rate Λ(𝑡,𝛼). The model parameter(s) 𝛼 can be maximum likelihood estimated from typical field data consisting of consecutive sojourns where PMs and SMs are right-censored CMs.