Mathematical Applications to Reliability and Maintenance Problems in Engineering SystemsView this Special Issue
Research Article | Open Access
Reliability Analysis of a Cold Standby System with Imperfect Repair and under Poisson Shocks
This paper considers the reliability analysis of a two-component cold standby system with a repairman who may have vacation. The system may fail due to intrinsic factors like aging or deteriorating, or external factors such as Poisson shocks. The arrival time of the shocks follows a Poisson process with the intensity . Whenever the magnitude of a shock is larger than the prespecified threshold of the operating component, the operating component will fail. The paper assumes that the intrinsic lifetime and the repair time on the component are an extended Poisson process, the magnitude of the shock and the threshold of the operating component are nonnegative random variables, and the vacation time of the repairman obeys the general continuous probability distribution. By using the vector Markov process theory, the supplementary variable method, Laplace transform, and Tauberian theory, the paper derives a number of reliability indices: system availability, system reliability, the rate of occurrence of the system failure, and the mean time to the first failure of the system. Finally, a numerical example is given to validate the derived indices.
Reliability has a wide range of applications in the field of engineering and natural science; its theoretical research has attracted considerable attention in the reliability literature . In practical engineering applications, a two-component cold standby system model is one of the most important models . A two-component cold standby system is composed of a primary component and a backup component, and the backup component is only called upon when the primary component fails. Cold standby systems are commonly used in noncritical applications. They are important structures in the reliability engineering and have been widely applied in reality .
Most of the researchers assume that the system may fail due to intrinsic factors such as ageing or deterioration. However, in practice, external factors can also cause the system failure. For example, a computer system may fail due to the invasion of some virus or an attack from the raider, for the virus and the raider may arrive randomly; they are called stochastic shocks. Another example is the metal materials due to a very rapid hot-to-cold or cold-to-hot instantaneous change; the internal temperature will produce big change and cause great impact on thermal shock stress. This phenomenon is called thermal shock. In this case, the system may fail due to the adverse environment.
The shock model, one kind of familiar models in the reliability theory, has been extensively studied . Traditionally, there are three classic random shock models focused: extreme shock models, cumulative shock models, and -shock models [5, 6]. Study the extreme shock models: the system fails when an individual shock is too large. In these papers, they assume that if the magnitude of a shock exceeds the threshold of the operating component, it will fail. In the last decades, the main interest of existing research  focuses on one-component system under Poisson shocks, it assumes that the damage to the system which resulted from a single shock can be accumulated, and the system fails when the damage has accumulated to a certain level. But this model does not apply to all of the problems. Wang and Zhang  study a shock model for a repairable system with two-type failures. It assumes that two kinds of shock in a sequence of random shocks will make the system fail: one based on the interarrival time between two consecutive shocks less than a given positive value and the other based on the shock magnitude of single shock more than a given positive value . Under this assumption, they obtain some reliability indices of the shock model such as the system reliability and the mean working time before system failure. Recently, literature  considers that the operating environment of the component is random, the working component may be influenced by other causes, they extend threshold on the component which is a random variable, and this assumption is more realistic. Due to the variability of the external environment, the influence of various random factors impact on the reliability of system is a common phenomenon; learning and mastering system’s operation and its regularity have practical significance. With the development of science and technology, the high precision and the high reliability system research of the system are taken more seriously, so it makes the shock models have broader applications.
In the practical engineering applications, due to aging or deterioration, the system will be older and older, and after failure the system usually cannot be repaired as good as new; it seems more reasonable to assume that the operating time of the system after repair will become shorter and shorter while the repair time of the system after failure will become longer and longer; at last, system will no longer work. The patterns of the operating and repair time can be described as geometric processes just like many authors have studied [10, 11]. The geometric process has been applied in reliability analysis and maintenance policy optimization by many authors [12–14]. But the geometric process cannot be used to describe the bathtub-shaped failure process of system. The bathtub-shaped failure refers to the whole life cycle of the product from the input to the croak; its reliability changes present certain regularity. Let the product failure rate be the product reliability characteristic values; it is based on the use of time as the abscissa, ordinate for failure rate of a curve. The curve at both ends is high, middle low, looks like a bathtub, and so is called the bathtub curve. Hence, to overcome this shortcoming, Wu and Clements-Croome [15, 16] introduce extended Poisson process (EPP) that is used to describe the change of the failure intensity is bathtub-shaped.
The existing research mostly focuses on the reliability analysis with the behaviors of the systems themselves, but the reliability analysis for a system with repairman vacation is less studied. From the opinion of using resources rationally, the introduction of repairman vacation makes the repairable system more realistic and reasonable. This is due to the fact that the mostly small and medium-sized enterprises cannot afford to hire a full-time repairman. During the vacation, the repairman can purchase the parts or do other works to increase the system benefits. So, the repairman usually plays three roles: one for caring the facility in his idle, one for repairing the failure component, and one for other works in his vacation. Under normal circumstance, the repairman needs to periodically check the status of the system in his idle. If he checks out the system fails, he needs to repair it immediately after the end of vacation; otherwise, he will leave the system again for next vacation. Doshi  studies a comprehensive survey on vacation system models. Su and Shi  discuss the reliability of a -component series system in which the repairman takes multiple vacations. Yue et al.  study Gaver’s parallel repairable system attended by a cold standby component and a repairman with multiple vacations; besides, they investigate the parameters’ effect on the steady-state availability by numerical comparison and analyze the benefits of the system.
Motivated by the above aspects, this paper considers a two-component cold standby system, in which the operating component may fail due to the intrinsic factors or external factors; besides, the repairman can take vacation. Assume that the intrinsic lifetime and the repair time of the components are extended Poisson process and the arrival of the shock follows a Poisson process, while the vacation time of the repairman obeys the general continuous probability distribution. The paper derives the reliability indices and a numerical example is given to validate the derived indices.
2. Definitions and Assumptions
In this paper, we use the following notation.(1)The time interval from the completion of th repair to the completion of the th repair of component is called the th cycle of component , where ; .(2)Time to failure that is due to the intrinsic cause is called the intrinsic lifetime of component .(3)Time to failure that is due to the extrinsic shocks is called the shock lifetime of component . : Operating time of component in the th circle : Intrinsic lifetime of component in the th circle : Operating time of component in the th under Poisson shocks : Operating time of component in the th circle : Cumulative distribution function (cdf) of : cdf of : cdf of : Repair time of component in the th circle : Vacation time of repairman in the th circle : Magnitude of each shock : Threshold of component : State of the system at time .
Definition 1 (see ). Given random variables and , one calls that is stochastically larger than , , or is stochastically smaller than , , when A stochastic process is stochastically increasing (decreasing) if for all .
Definition 2 (see ). A sequence of nonnegative independence random variable is called a geometric process (GP), if for some the cumulative distribution function of is . is called the parameter of the GP.
With Definition 2, we have the following.(1)If for , then is stochastically decreasing: .(2)If for , then is stochastically increasing: .(3)If for , then is a renewal process.
Below we will introduce a new process which can be used to describe scenarios with complicated failure intensities.
Definition 3 (see ). A sequence of nonnegative independent random variable is called an extended Poisson process (EPP), if some , , , and , the cumulative distribution function (cdf) of is and is an exponential cdf. , , , are parameters of the process.
With Definition 3, the following can be obtained.
Scenarios(1)If , then the EPP is an HPP.(2)If and (or and ), then is a GP.(3)If and , then can describe the periods from the intrinsic failure period to the wear-out time period in a bathtub curve.(4)If , , , then can describe the periods from the burn-in time period to the end of intrinsic failure period in a bathtub curve.(5)If , , , , then can describe more complicated failure intensity curves.
The following assumptions are assumed to hold in what follows.
(A1) The system is composed of two components: a switch and a repairman. At the initial time, both components are new: component 1 is operating while component 2 is on cold standby, and the repairman is in idle. Once the operating component fails, the cold standby component will be switched to the operating state, assuming the switch is perfect and switch time can be neglected.
(A2) The system subjects to shock. The arrival of the shocks follow a Poisson process with the intensity . The magnitude of each shock is an independent random with distribution function .
(A3) When a shock arrives, it only affects the operating component and the operating component will fail only when the magnitude of the shock exceeds the threshold. The threshold of component is with a distribution function , ; every shock is independent.
(A4) The intrinsic lifetime and the repair time on components follow the extend Poisson process, respectively. The distribution of the intrinsic lifetime and the repair time of component in the th cycle are where , , , , , and , respectively.
(A5) The repairman has single vacation rules as follows. When a component fails with the presence of the repairman, it will be repaired immediately. Once the failed component is repaired and there is no failure component in system, the repairman will take vacation. If one component fails when the other is being repaired, the newly failed component must wait for repair and the system is down. If two components are waiting for repair when the repairman returns from a vacation, the repair rule is “first-in-first-out.” If there is no failure component when the repairman returns from a vacation, he remains idle until the first failure component appears. Denote by the vacation length of the repairman. Its distribution is Denote , .
(A6) The failure of the operating component may be caused by intrinsic factors or external shocks, and the system fails only if both of the components fail.
(A7) After repair, both of the components are not as good as new. All random variables and processes are statistically independent.
3. Model Development
Based on the above model assumption (A3), we can get the probability that one shock causes the operating component failure in the th cycle which is
Lemma 4 (see ). The distribution function of is hence, .
Lemma 5 (see ). The operating time of the component in the th cycle is ; its distribution function is ; then hence, , .
Let be the system state at time ; then we have the following.
State 0. At time , component 1 is operating, component 2 is on cold standby, and the repairman is in idle.
State 1. At time , component 2 is operating, component 1 is on cold standby, and the repairman is in idle.
State 2. At time , component 1 is operating; component 2 is being repaired.
State 3. At time , component 2 is operating; component 1 is being repaired.
State 4. At time , component 2 is being repaired; component 1 is waiting for repair.
State 5. At time , component 1 is being repaired; component 2 is waiting for repair.
State 6. At time , component 1 is operating, component 2 is on cold standby, and the repairman is taking a vacation.
State 7. At time , component 2 is operating, component 1 is on cold standby, and the repairman is taking a vacation.
State 8. At time , component 1 is operating, component 2 is waiting for repair, and the repairman is taking a vacation.
State 9. At time , component 2 is operating, component 1 is waiting for repair, and the repairman is taking a vacation.
State 10. At time , two components are waiting for repair; the repairman is taking a vacation.
Obviously, the state space is , where the operating state set is and failure state set is . According to the assumptions, is not a Markov process. Hence, we introduce the following supplementary variables: : the number of cycles of component at time ; : the elapsed vacation time when the repairman is taking a vacation at time .
Then is a continuous four-dimensional vector Markov process with state space , , , , , , , , , , where ; .
The state probabilities of the system at time are defined by
By the nature of the cold standby, and by the fact that component 1 and component 2 are operating alternately in system, we can obtain that their cycles meet the following relations:
So the continuous four-dimensional vector Markov process can be converted into a continuous three-dimensional vector Markov process , where is the number of cycles of component 1 at time . Then the state probabilities of the system at time are defined by Then, we can get each state transition diagram as shown in Figure 1, where , .
By using the probability arguments and limiting transitions, we can get the following differential equations for the system; for , we have Let tend to zero; we have In the same way, we have Their boundary conditions are The initial conditions are We introduce the Laplace transform and we denote the Laplace transform of by , , and by , ; . The Laplace transforms of the above differential equations are, respectively, given by The boundary conditions are According to (15)–(18), we have where When , we have
As the formulae , have been calculated, when , by (17), we can obtain ; combining with (17), then we can get . By the recurrence formula (17), we can obtain , when . So when , , , and , , can be obtained.
4. Reliability Indices
4.1. System Availability
By the definition of the system availability , we have The Laplace transform of is given by where Then, according to the Tauberian theorem, the limiting availability of the system is given by
This is consistent with our intuition. Since neither of the two components is as good as new after repair, system availability will tend to 0 with .
4.2. System ROCOF
Let be the expected number of the system failures in the . Its derivative is called the rate of occurrence of the system failure (ROCOF) at time . Then, With the result  and in view of the system model analysis, we have The Laplace transform of is where
4.3. The Idle Probability of the Repairman
By the system analysis, we can obtain that the repairman is idle when and only when the repairman vacation is over; one component is operating, while the other is on cold standby. Thus, the idle probability of the repairman at time is given by The Laplace transform of is Then, according to the Tauberian theorem, the idle probability of the repairman in the steady state is given by
This conclusion is also consistent with our intuition. In fact, after repair, the components are not as good as new; their consecutive operating time is stochastically decreasing and their consecutive repair time is stochastically increasing. Finally, they will be irreparable. Thus, the repairman has to repair them frequently, forever. This implies that the idle probability of the repairman will be zero, as .
4.4. The Vacation Probability of the Repairman
Let be the repairman vacation probability, by the system analysis; when the system is at 6, 7, 8, 9, and 10 states, the repairman is taking vacation. So the vacation probability of repairman at time is The Laplace transform of is
4.5. System Reliability
In order to obtain the reliability of the system, we let the above three failure states 4, 5, and 10 be the absorbing states, and we can obtain another vector Markov process . Let , . Let , . Denote the state probabilities of the system at time . Using the method similar to Section 3, we have the following differential equations: The initial conditions are Let , be the Laplace transform of , , respectively, so we have , , and , ; . The Laplace transforms of the above differential equations are, respectively, given by The solutions of above equations can be written as With the definition of the system reliability , we have With the above explicit expressions, we obtain that the Laplace transform of is
4.6. System MTTFF
Let be the lifetime of the system before being failed. According to the definition of the mean time to the first failure (MTTFF), we have With (40), then we can get