An Accelerated Simulation Approach for Multistate System Mission Reliability and Success Probability under Complex Mission
The mission reliability and success probability estimation of multistate systems under complex mission conditions are studied. The reliability and success probability of multistate phased mission systems (MS-PMS) is difficult to use analytic modeling and solving. An estimation approach for mission reliability and success probability based on Monte Carlo simulation is established. By introducing accelerated sampling methods such as forced transition and failure biasing, the sampling efficiency of small-probability events is improved while ensuring unbiasedness. The ship’s propulsion and power systems are used as applications, and the effectiveness of the method is verified by a numerical example. Under complex missions, such as missions with different mission time and their combinations, and phased-missions, the proposed method is superior in small-probability event sampling than the crude simulation method. The calculation example also studies the influence of mission factors or system reliability and maintainability factors on system availability and mission success probability, and analyzes the relationship between different mission types and system availability and success probability.
The mission process of a system is often composed of multiple phases that are continuous and nonoverlapping in time and functionally related. A system or its constituent units are often a multistate system or component that gradually transitions from normal operation to complete failure. Such multistate phased mission systems (MS-PMS) [1, 2] usually have complex system structures. A large number of components, subsystems, multiphase switching, and the complexity of the running process makes it difficult to evaluate and predict mission reliability and mission success. At present, phased mission systems have been widely used in industries and military. For weapons and equipment systems, if they can estimate their reliability and mission success-related indexes in the mission profile, it will be very beneficial to the system’s engineering design and maintenance support-related design. It can provide a basis for evaluating the combat effectiveness of equipment.
There are currently two main methods for assessing the mission reliability and success probability: analytical methods and simulation methods. The analysis method can be divided into combinatorial model method [3–6], and state space method [7–10]. Combinatorial model method includes reliability block diagram method and fault tree analysis method . The fault tree method can be combined with binary decision diagram (BDD) [3–5] and its derivative methods such as multivalued decision diagram (MDD) , aggregated binary decision diagram (ABDD) , and logarithmically encoded BDD (LBDD) . These methods make reliability problems easier to model and calculate in MS-PMS. BDD is a directed acyclic graph, which can only deal with the problem of a limited number of basic events, and it is difficult to deal with the reliability problem of repairable systems. State space methods include Markov’s method [8–10] and Petri net method , both of which are based on stochastic process theory. The Markov method combined with the universal generating function  can effectively solve the state space explosion. The analytical method can effectively analyze the system reliability in a specific mission, but it cannot calculate the success probability involving a specific mission. The idea of the analytical method to deal with PMS is to connect different phases in series, and one unit is regarded as different units in different phases. Therefore, the phase dependencies of components also need to be considered. The simulation method has good generality and can effectively solve the system reliability and success assessment. And it is less affected by the complexity of the system structure and mission requirements, which will not cause greater difficulties in the modeling method, model processing, and calculation. The core method of simulation is the Monte Carlo method [15–17] which is used to generate random events to simulate the behavior of the system in specific mission. The component state is updated as the simulation time, not as the phases. So, there is no problem of phase dependency of components. However, due to the low calculation efficiency and large variance of the calculation results in the application process, specific sampling techniques, such as forced transition (FT) and failure biasing (FB), need to be adopted [15, 16, 18].
The chapter arrangement of this paper is as follows: Section 2 explains the principle of state transition simulation of multistate systems. Section 3 explains the principle of forced transition and failure biasing and clarifies the conditions of use of various sampling methods and correction of statistical indicators. Section 4 evaluates the reliability and success probability of three complex missions in the background of warship navigation missions. Section 5 draws the corresponding conclusions based on calculation examples.
2. Multistate System Simulation Method
When the system structure, function, and dynamic behavior are complex, system reliability assessment is often difficult to model and solve using analytical methods. In contrast, the Monte Carlo simulation method is a more feasible approach. The Monte Carlo method simulates the dynamic behavior of the system by generating random discrete events. For the convenience of description, system state changes such as failures and degradations are collectively referred to as random failure transition, and system state changes such as maintenance and repair are collectively referred to as random repair transition. The Monte Carlo method can be used not only for two-state systems, but also for multistate systems. The basic sampling method used in multistate system simulation is indirect Monte Carlo (IMC). The steps of one sampling are as follows: first, use the r. v. ξ to randomly determine the time of state transition, and then use the r. v. ζ to randomly determine the unit and the entered state of state transition by roulette selection, as shown in Figure 1. F−1(·) in the figure represents the inverse function of the probability distribution of the system state holding time.
For a multi-state system, there are multiple performance states in the subsystems and units, and the transition between the states obeys a specific probability distribution. The system state transition at a given moment is determined by a probability density function (PDF), which can fully describe the random behavior of the system in the time domain. A state transition needs two elements: one is what state the transition enters; the other is when the state transition takes place. If represents the conditional probability of state at time t, thenwhere represents the transition rate out of state at the moment t; represents the transition rate at the moment t; represents the reachable state set of state . According to the conventional identification habits, when the system fails or degrades after the transition, the transition rate is represented by ; when the system is repaired after the transition, the transition rate is represented by ; represents the reachable fault or degraded state set, and represents the reach Set of repair states, then
is the conditional probability that the next state transition time is t at time , then
The state probability transition kernel of a multistate system can be expressed as the PDF ofwhere represents set of the current state and time ; represents set of the state and time after the state transition . After several state transitions, the system forms a random walk sequence of states transitions:
For any random walk in the sequence, it can be achieved by sampling in the probability transition kernel.
First, the time of system state transition is sampled, and then the state that the system enters is determined randomly by roulette. The moment of state transition of can be calculated by the following equation:where is a random number uniformly distributed between 0 and 1. For example, when the transition time obeys the exponential distribution, t can be calculated, as shown in the following equation:
According to equation (8), determine the specific state transition by roulette:where is a random number uniformly distributed between 0 and 1. When it falls within the interval of equation (8), it shifts to the state ; therefore, the equipment that has undergone the state transition and the state that is transferred into is determined. IMC can generate a state transition by using two random numbers and does not require comparison and other operations, thereby improving the calculation efficiency.
3. Accelerated Sampling Method for Multistate System Simulation
The results obtained only by IMC may have large variance, low accuracy, and low simulation efficiency, when the probability of some notable events is small (such as high reliability, low failure rate) or small compared to other events (such as shorter mission time or the repair rate is significantly higher than the failure rate). So, specific sampling methods are needed.
The two accelerated sampling methods used in this study are based on IMC to improve the sampling probability of small-probability events and reduce the variance of calculation results. For sequential random walk of the system state, the original system state transition space is
IMC is sampling in the original space. The accelerated sampling method is to build a new system state transition PDF , thereby changing the sampling space of random events to achieve the purpose of easily sampling small-probability events. Because the accelerated sampling method changes the system state transition space and the probability of occurrence of the event, in order to maintain the unbiasedness of the results, the state transition weights are calculated at the same time during any random walk sampling process and initialized at = 1, as shown in equation (10).
3.1. Forced Transitions
Under the conditions of short mission time and high system reliability, it is difficult for the system to fail during the mission time. It is difficult to sample the mission failure when simulating such a high-reliability system. In order to increase the number of samples of failure transition in the mission time, Lewis et al.  proposed forced transition (FT). Assuming the mission time is T, in order to obtain a sample in , the sampling space of the state transition must be modified. Then built PDF
It can be seen that forced transition only changes the sampling space of the transition time, and does not change the sampling space of the transition state. Then, according to the IMC, the sequence of system state random walk is sampled. At this time, equation (6) shifts to
For example, when the transition time obeys the exponential distribution, t can be calculated:
The forced transition causes the state transition time to occur before T, which objectively increases the probability of a small-probability random event. Then the simulated sample count is corrected. According to equation (12), the probability of event occurrence is enlarged by a factor of compared to that in the original system state transition space. Therefore, the simulated sampling count should be modified according to equation (10). If the transfer obeys the exponential distribution, then
The purpose of forced transition in high-reliability system simulation is to increase the sampling probability of failure transition, but the method of forced transition does not distinguish between failure transition and repair transition in the state transition space. Although the state transition can occur before the end of the mission, the probability of repair transition is usually much greater than the probability of a failure transition, so it is difficult to cause the system to fail. According to the above phenomenon, the conditions for using forced transition are as follows: when only failure transitions exist in the system and the sampling method chooses the forced transition.
3.2. Failure Biasing
Under the condition of high system maintainability and short repair time, the system can be quickly repaired in a short time after a failure so that the repairable system can operate normally most of the time without affecting the mission success. Similar to the idea of forced transition to improve the sampling of small-probability events, Lewis also proposed failure biasing (FB). In order to increase the probability of failure transitions in the next sampling, the sampling space for state transitions needs to be modified. Then built PDFwhere x is the failure biasing coefficient, which is used to modify the relative proportion between failure transition and repair transition. It can be seen that failure biasing does not change the sampling space of the transition time, but the sampling space of the transition state. The meanings of x and (1 − x) are the proportions of failure transitions and repair transitions to all state transitions in this sampling, respectively. x usually ranges from 0.5–0.7 . Then, the failure transition rate will be much higher than the one that sampled in the original space. In order to keep the estimated value unbiased, the simulated sampling count should be modified according to equation (10), then
Failure biasing only changes the proportion of failure transition and repair transition in the system state transition space and does not change the type of the transition. Therefore, when only failure or repair transitions exist, failure biasing loses its meaning. According to the above phenomenon, the conditions for using failure biasing are as follows: when the failure and repair transition both exist in the system, the sampling method chooses the failure biasing.
3.3. Correction of Statistical Indicators
The statistical indicators in this study are the system availability at any time during the mission and mission success probability. Due to the introduction of forced transition and failure biasing, the system state transition sampling space is modified. Therefore, the abovementioned statistical indicators cannot be directly obtained by accumulating the number of samples of system availability and mission success, but should be calculated from the statistics of system unavailability and mission failure probability; thenwhere corresponds to system availability or mission success probability and corresponds to system unavailability or mission failure probability.
N simulation experiments produced N certain system state transitions random sequences . Based on the cumulative transition effect of the state transition weights, the final state transition weight of a random walk sequence is used to modify the statistical value:where represents the random walk sequence before the statistical moment; represents the calculated state transition weight; and is the indicator function corresponding to which represents whether the mission failed or the system is unavailable at the statistical moment. Specifically,
It should be noted that the statistical moment of system unavailability is different from the mission failure probability. The system unavailability can be counted at any time during the mission, but the mission failure probability should be counted at the end of the mission. The discrete event sequence is not affected by statistics, so we can obtain the mission reliability index and mission success index at the same time through one simulation.
4. Analysis and Results for Typical Complex Multistate Mission Systems
The main missions of the warship are navigation and combat. Systems that work in different mission profiles may have different types, numbers, and functional structures. Warship equipment includes two major types of platform systems and combat systems. The former mainly includes propulsion, power, hull, communication and navigation, and warship support subsystems, among which the propulsion and power subsystems are key parts: the latter mainly includes various combat subsystems such as various types of missiles, naval guns, torpedoes and so on. Due to space limitations, only the ship’s propulsion and power subsystems in the navigation stage are used as research objects in the application examples. The corresponding calculations for other systems and their phased-missions are also applicable.
4.1. Multistate System
The structure of the propulsion and power subsystems during the navigation is simplified in this example to make the example not too complicated. The system structure during navigation is shown in Figure 2.
In particular, the system structure changes as the phases switch in PMS. In this example, it is assumed that warship navigation missions may be interspersed with special situations such as small-scale combats. The system structures in navigation and combat are shown in Figures 2 and 3, respectively. Diesel engines are no longer used in the combat phase, and gas turbines are required to be intact. For example, a gas turbine can output 80% of its power in the degradation state, but it cannot meet the mission requirements, so the system is still considered as unusable.
All devices in Figures 2 and 3 are multistate devices. The gray equipment has multiple performance output characteristics, and there is a reduced power state, including diesel engines and gas turbines. The other devices do not have multiple performance output characteristics. Figure 4 is a state transition diagram for different device types. There are 4 states of diesel engine and gas turbine, which are 3-intact, 2-degradation, 1-general fault, 0-fatal fault; the remaining equipment has three states, which are 2-intact, 1-general fault, 0-fatal fault. Intact indicates that the device can run at full power; degradation indicates that the device can only run at a certain percentage of full power; general faults are repairable faults; fatal faults are unrepairable faults. λ and μ represent the failure and repair transition rate between corresponding states.
The speed of a ship is related to the power output rate of the propulsion system which is determined by the state of the power system. The subsystem state should be determined by combining Figure 4 and Table 1. The various power output characteristics of the propulsion system are mainly determined by the working state and functional structure of the diesel engine and gas turbine. Set the power output coefficient to , and the corresponding maximum output power is
For simplicity, and in equation (20) can be directly replaced by and .
4.2. Mission Description and Mission Success Decision
Warship can neither arrive early nor exceed the prescribed time. Arriving early may lead to departure from the formation’s operational scope and exposure; if it arrives beyond the prescribed time range, it will be difficult to complete the combat mission. In the simulation, the average speed of the remaining distance and the time interval corresponding to a random event ∆t is used to calculate the distance traveled during this period. can be calculated bywhere is the mission distance; is the current cumulative distance; is the mission time; t is the current time. In the simulation, the mission distance is set by the following equation:where is the maximum speed at full power and ϑ is the distance relaxation coefficient (0 <ϑ < 1). The value of ϑ is determined according to the urgency of mission. The more urgent the mission, the greater the value of ϑ, which is closer to 1. Considering the actual situation, the warship usually has a certain economic cruise speed. The speed is related to the power output of the propulsion system and the speed rules. When the average speed is lower than , it will sail at speed, as follows:where θ is a power output coefficient (0 <θ < 1), and the values are shown in Table 1. It should be noted that when propulsion system is degraded in the mission, may temporarily occur and the remaining distance cannot be completed on time. When the repair is completed, the power output is restored to full, so and the remaining distance can be completed on time. Therefore, the OS is based on 100% power output during the mission. If is greater than , the mission fails; otherwise, the mission succeeds. When the mission is still unable to recover to 100% power output, it is determined by whether it is greater than ; if is greater than , the mission fails; otherwise, the mission succeeds.
Considering the complexity of the mission, this study sets three types of scenarios:(1)Scenario I: the basic navigation mission(2)Scenario II: a combination of basic missions that takes into account the randomness of mission time over a longer period(3)Scenario III : random phased-missions in the integrated navigation and combat phases
4.2.1. Scenario I
Set the goal of the navigation mission: ① arriving at the destination at the prescribed time; ② being still in the available state (assuming that all states except downtime are available) at the end of the navigation mission. The former is a time and space requirement for completing the mission, which is related to the system’s application mode and performance output. The latter is a requirement for system reliability. Since warship is a repairable system, system availability measures its reliability. A mission can only be decided to be successful if both goals are met. The problem of mission success in practical situations will be very complicated. Whether the mission goal is achieved should be combined with the mission success decision rules, as shown in Table 2.
In Table 2, over speed (OS) means that due to devices failure and maintenance, the warship repeatedly stopped and resumed, resulting in that it could not reach the target area on time even if the warship was sailing at the maximum speed. Unfinished maintenance (UM) means that at the end of the mission, the warship system is in a state of downtime and unavailability caused by a general failure. The system is under repair but temporarily unavailable. Fatal fault (FF) means that the fatal fault of some critical devices (usually no backup or less backup) during the navigation mission, which causes the warship system to be unavailable, and unable to continue the mission.
4.2.2. Scenario II
The goal of scenario II is the same as the scenario I; see Table 2, scenario II is the expansion of scenario I in the time dimension. The first change is to expand the mission from a specific mission to a series of missions in a long period, which assesses the mission success probability over a longer period. The selected index is the average mission success probability. The second is to consider the uncertainty of mission time.
Assume that the single mission is divided into three types: short, medium, and long. Each of these three types of mission time obeys a random distribution. Three types of missions may occur in a long period, and one type is randomly selected from the three types each time when the mission is executed. Each mission time is accumulated. When the accumulated time exceeds the set value, the warship enters a rest state. The mission failure probability is calculated as follows:where represents the average mission failure probability of the i-th simulation; represents the weight of the m-th mission in the i-th simulation. Assuming that all missions are equally important, .
4.2.3. Scenario III
According to the characteristics of scenario III, it is assumed that the ship will randomly enter the combat phase after the navigation phase lasts for a period. So the goal of Scenario III has been extended ① arriving at the destination at the prescribed time; ② being still in the available state (assuming that all states except downtime are available) at the end of the navigation mission; ③ moving at high speed at any time during the combat. A mission can only be decided to be successful if three goals are met. The specific evaluation criteria are shown in Table 3.
The first four types of mission success index in Table 3 are consistent with Table 2. Failure in combat (FC) means that during the combat phase, the ship system is unavailable, due to various failures and degradations, so it cannot meet the high-speed maneuverability requirements.
The combat duration can be considered to follow a normal distribution, but sampling with a normal distribution may draw negative values, so Weibull distribution or lognormal distribution may be used. So it is assumed that the navigation duration follows an exponential distribution, the combat duration follows a Weibull distribution, and at most one combat occurs during the mission. The navigation duration and combat duration should be determined according to the intensity of mission.
4.3. Simulation Flowchart
Figure 5 shows the general system simulation framework for complex mission. The lowest level (unit) of the system runs discrete event simulations to drive changes in the upper system state. The complete process of generating each discrete event mainly goes through three modules: the sampling method selection module, the system state transition module, and the mission success decision module. The purpose of the sampling method selection module is to use a suitable sampling method. Which method the simulation chooses depends on the current system state. The system state transition module generates the next random system state transition event. The generation method is determined according to the selected sampling method. Units state variables and upper system state variables are updated at the same time. The mission success decision module determines the system performance output according to the current system state and updates the completed and uncompleted workload to advance the mission progress. Then decide whether it meets the mission success conditions. Finally, mission reliability and success probability can be counted. For different types of mission, the simulation flow will differ in details. But the mission simulation flow is strictly performed in accordance with the three modules in general system simulation framework. In other words, if the problem changes, we only need to modify the content in the corresponding module.
(1)Figure 6 shows the detailed mission simulation process where is the reachable set of transitions in the current system state, and the sampling method is determined according to . n is the total number of units included in the system. The generated random event is represented by three elements, where Δt represents the occurrence time of this transition. I represents the number of device that occurs state transition. E represents the type of state transition. When determining the success or failure type of the mission, if , the OS is determined. Calculate the current state of the system. If the system is unavailable and impossible to repair, FF is determined. If and the system is unavailable and under maintenance, the UM is determined.(2)The brief simulation flow of scenario II is shown in Figure 7, and the specific process of single mission simulation is shown in Figure 6.(3)The brief simulation flow of scenario III is shown in Figure 8, where and are the start and end time of the combat phase. When the mission start, phase = ‘sailing’. The rest of the flow is shown in Figure 6.
4.4. Data Settings
The assumptions of the example are ① state transition obeys Markov process; ② state duration of failure transition such as fault and degradation obey exponential distribution; ③ state duration of repair transition obey exponential distribution; and ④ All repair is perfect repair (repair the old as new), and the state is returned to the intact state after the repair, as shown in Figure 4.
4.4.1. System Parameter Settings
Table 4 gives the reliability and maintainability parameters of the device involved in the simulation, assuming that the parameters of the same kind of device are the same. and in the table indicates the transition rate of a device to the m state. All three types of mission use this system parameter.
4.4.2. Mission and Simulation Parameter Settings
Set economic speed , full power maximum speed , and distance relaxation coefficient . The mission time ranges is in [1 h, 720 h]. Simulation times and failure biasing coefficient .
In particular, for scenario II, the mission period , and the short, medium, and long types of mission time obey a triangular distribution, with the most probable values being 20, 120, and 720, respectively. The maximum and minimum values are increased or decreased by 20% based on the most probable value.
And for scenario III, assuming that the combat phase can only occur 120 hours after the start of navigation phase, then the combat phase start time and . Combat duration . The expectation and variance of Combat duration is about 100 and 100.
4.5. Example Simulation Results and Analysis
4.5.1. Scenario I
A comparative simulation test was performed with and without the accelerated sampling method. The comparison is made in three aspects: ① the comparison of the mission success index and the availability at the end of the mission calculated by the two simulation methods; ② the comparison of the convergence speed of the mission success index calculated by the two simulation methods; ③ comparison of system availability with simulation and analytical methods. Among them, “without accelerated sampling method” means that only the direct Monte Carlo (DMC) is used. The analytical method uses Markov state transition equations and general generating functions, and the details are in the appendix.
In this example, the dynamic behavior of the system, state transitions, and specific missions are combined, which makes the problem more complicated, and it is not easy to obtain task success results by using analytical methods. However, the system availability has nothing to do with the mission process, so it can be solved analytically and compared to verify the accuracy of the simulation method.
Table 5 shows the results of the mission success index and the availability calculation at the end of the mission with and without accelerated sampling at the mission time of 20 h, 120 h, and 720 h. Table 6 shows the mission success index and the availability calculation result at the end of the mission under different distance relaxation coefficients when the mission time is 120h. Some conclusions can be drawn by analyzing the data:(1)Judging from the mission success index, in mission failure types, OS has the highest proportion, followed by UM, and FF has the smallest proportion. And the probability of mission failure is greater when long missions are executed, and the proportion of several failure types increased at the same time.(2)Judging from the system availability at the end of the mission, both methods can realistically reflect the system availability, but the simulation results using the accelerated sampling method are closer to the analytical results.(3)For some small-probability events, for example, when the mission time is short, the probability of FF is difficult to sample without the accelerated sampling method. It should also be noted that when the simulation time is 104, the precise digits can reach any decimal place with the accelerated sampling method; the precise digits can only reach 10−4 without the accelerated sampling method.(4)It can be seen from Table 6 that as ϑ increases, MS decreases, and the main cause of mission failure is OS. When ϑ = 1, the system is not allowed to go down, which is equivalent to measuring the inherent reliability of the system. ϑ is a parameter related to the mission, which only affects the successful determination of the mission and does not affect the availability of the system. Therefore, under different ϑ, SA remains the same.
Figures 9(a) and 9(b), Figures 10(a) and 10(b) and Figures 11(a) and 11(b) show the convergence of mission success index and system availability, respectively, at different mission time. Some conclusions can be drawn by analyzing the images:(1)Judging from the convergence speed, the convergence speed using the accelerated sampling method is faster, and the calculation results tend to stabilize before 5,000 simulations. This is even more apparent in a short mission time. Obviously, under the accelerated sampling method, more samples are obtained. The convergence of FF in Figures 9(a) and 10(a) shows that the accelerated sampling method can obtain more samples than the nonaccelerated sampling method, which greatly improves the sampling efficiency, and the calculation result is more accurate.(2)Judging from the comparison results between the simulation method and the analytical method, the instantaneous availability curves made by the two simulation methods are consistent with the one made by the analytical method, which verifies the correctness of the simulation method. Besides, the degree of curve fluctuations using the accelerated sampling method in the image is relatively small, which also proves the advantage of the method to a certain extent.
4.5.2. Scenario II
In Section 4.5.1, the correctness and efficiency of the simulation method in this paper have been verified, so the scenario II model is simulated using the accelerated sampling method. The influence of mission factors on mission success is reflected by changing the mission intensity. Mission intensity is described by the proportion of short, medium, and long missions. The higher the proportion of long missions in the mission cycle, the greater the mission intensity. Taking the result of the 1 : 1 : 1 mission intensity as the reference value, the comparison experiment was performed by changing the mission intensity. The results are shown in Table 7.
By changing devices MTBF and MTTR to reflect the impact of system reliability and maintainability on mission success. Under the mission intensity of 1 : 1 : 1, based on normal MTBF, the MTBF is increased by 10%, 20%, and 30%. The comparative results are shown in Table 8. And based on normal MTTR, reduce MTTR by 10%, 20%, and 30%. The comparative results are shown in Table 9.
It can be seen from Table 7 that the larger the proportion of long missions, the smaller the average mission success probability. And the main cause of the results is OS. The probability of failure of various missions increases with the proportion of long missions, but the error is different. Among them, FF has the largest error and is the most sensitive to the mission intensity factor. UM has the smallest error and is the least sensitive to mission intensity factors.
It can be seen from Tables 8 and 9, the better the reliability and maintainability, the greater the average mission success probability. The probability of failure of various missions decreases with the improvement of system reliability, but the reduction is different. Among them, the OS is most significantly affected by reliability and maintainability factors. This is because high reliability and maintainability can reduce the probability of general failure and shorten the maintenance time, thereby reducing the probability and time of downtime. FF is least affected by reliability and maintainability factors. It is worth noting that with the improvement of maintainability, both OS and UM have decreased, which is in line with expectations, but FF has increased. This may be because the improvement in maintainability shortens the system maintenance time, which increases the transfer to fatal failure (absorptive state), and then increases the exposure of FF.
4.5.3. Scenario III
Figure 12 shows the trends of the mission success indicators and system reliability with respect to mission time. Looking at Figure 12(a), it can be seen that the main cause of the mission failure is FC and OS. The FC curve rises sharply around 150 h to 350 h. This is because the combat phase turns in randomly after 120 h from the start of the mission, and the combat phase places higher requirements on system performance. If the system performance does not meet the mission requirements, it is determined as a mission failure, so the mission success decreases rapidly. Figure 12(b) shows the comparison results of the availability curves of PMS and basic navigation mission. When entering the combat phase, the current reliability and maintainability conditions could not restore the system to the performance output standards required by the combat phase, and the availability rapidly declined. Figure 12(b) verifies the cause of the sudden rise of the FC curve around 150 h to 350 h. If environmental stress is considered, for example, the failure rate of all equipment during the combat phase will be higher than normal. Figure 12(c) shows the availability curves of different failure rates during the combat phase. The basic fault rate is the same as in the navigation phase. It can be seen that environmental stress has a greater impact on system reliability.
The impact of system reliability and maintainability parameters and mission parameters on the mission success index by changing the MTBF, MTTR, and combat duration are observed. The results show that different factors will affect the success index of the mission, but the degree and reason of the impact are different. The MTBF is increased in different proportions for comparative analysis, and the result is shown in Figure 13(a). After improving system reliability, the system is not prone to go down, so the probability of all mission failure is reduced. The MTTR is decreased in different proportions for comparative analysis, and the result is shown in Figure 13(b). After improving system maintainability, OS decreases relatively more. First, improving maintainability cannot affect unrepairable faults in the system, so the FF cannot be affected. Secondly, during the combat phase, if the system goes down, the mission fails, so the FC cannot be affected. Finally, because UM itself is a small-probability event, its impact on MS is not obvious. The combat duration is changed for comparative analysis, and the results are shown in Figure 13(c). After increasing the combat duration, the FC improved significantly, in line with expectations. In actual situations, since the mission factor is an uncontrollable factor, the mission success probability can only be increased by improving system reliability and maintainability. But in this scenario, improving reliability is more efficient.
The proposed simulation method can simultaneously output two indicators of mission success probability and system availability in a set of simulation processes and can effectively solve the mission reliability and success evaluation problems under complex missions. By introducing forced transition and failure biasing, simulation efficiency can be effectively improved for basic missions with different mission times. When the mission time is short, the sampling efficiency is significantly improved for various statistical indicators. When the mission time is long, the sampling efficiency of the statistical index corresponding to the small-probability event is also significantly improved. For the combination of ferry missions at different mission times, increasing the proportion of long missions in the mission structure will reduce the mission success probability. Improving system unit reliability can effectively improve system availability and mission success probability. Increasing maintainability can improve the usable state of the repairable system unit, but the repaired system is still affected by fatal failures (absorptive states), especially in the case of longer missions, the exposure of fatal failures increases, inhibiting system availability and mission success probability. The critical phase with higher system requirements will have a greater impact on the mission success of a PMS. The simulation method proposed in this paper can find these critical phases and provide a basis for some maintenance strategies. Besides, the system has more complex failure mechanisms such as competing failures, cascading failures, failure, or degradation interval distribution related to cumulative working time, etc. These complex factors will be considered in further research.
The analytical method used in scenario I. Assuming that device j has mj states, the Markov state transfer equation of the device is
The probability can be solved under various states at time t. The PDF of device is uniformly expressed as the following uj (t) after z transformation:
The PDF of the system state can be calculated by equation (A.5). ϕ is a structural logical relation operator, which is related to the system structure and behavior mode.
The PDF of the system state at time t in the example should bewhere, p1, p2, p3 represent the probability that the system performance is 100%, 80%, and 0%. (p1 + p2) is the probability that the system is available.
The MATLAB code used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
This work was supported by the National Natural Science Foundation of China (71401171), and China Postdoctoral Science Foundation (2019M653925).
M. Alam and U. M. Al-Saggaf, “Quantitative reliability evaluation of repairable phased-mission systems using Markov approach,” IEEE Transactions on Reliability, vol. 35, no. 5, pp. 498–503, 1987.View at: Google Scholar
E. Zio, The Monte Carlo Simulation Method for System Reliability and Risk Analysis, Springer-Verlag, London, UK, 2013.
X. J. Su and X. Z. Lyu, “Reliability analysis of phased-mission system with multiple states based on discrete event simulation,” Acta Armamentarii, vol. 38, no. 4, pp. 776–784, 2017.View at: Google Scholar
J. Faulin, A. A. Juan, S. Martorell et al., Simulation Methods for Reliability and Availability of Complex Systems, Springer-Verlag, London, UK, 2010.